2023-11-27 11:49:48,607 INFO [train_asr.py:1303] (0/4) Training started 2023-11-27 11:49:48,612 INFO [train_asr.py:1313] (0/4) Device: cuda:0 2023-11-27 11:49:48,613 INFO [train_asr.py:1325] (0/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'multi_KD', 'icefall-git-sha1': 'a9ea720f-dirty', 'icefall-git-date': 'Wed Nov 22 17:48:49 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_multi_KD', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/anaconda3/envs/multi_KD/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-2-0423201334-6587bbc68d-tn554', 'IP address': '10.177.74.211'}, 'world_size': 4, 'master_port': 13490, 'tensorboard': True, 'num_epochs': 60, 'start_epoch': 39, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'audio_tagging_loss_scale': 1.0, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'stop_early': False, 'do_finetune': False, 'init_modules': None, 'freeze_modules': None, 'finetune_ckpt': None, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'do_audio_tagging': True, 'use_encoder_projection': False, 'encoder_projection_dim': -1, 'freeze_encoder': False, 'freezing_encoder_layer_index': '-1', 'freeze_encoder_steps': -1, 'encoder_lr_scale': 1.0, 'beats_label': False, 'full_libri': True, 'mini_libri': False, 'use_vox2': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_audioset': True, 'audioset_subset': 'unbalanced', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 1, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'small.en', 'blank_id': 0, 'vocab_size': 500} 2023-11-27 11:49:48,614 INFO [train_asr.py:1334] (0/4) About to create model 2023-11-27 11:49:49,320 INFO [train_asr.py:1338] (0/4) Number of model parameters: 65819362 2023-11-27 11:49:49,843 INFO [train_asr.py:1362] (0/4) Using CED labels! 2023-11-27 11:49:49,844 INFO [checkpoint.py:112] (0/4) Loading checkpoint from multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-38.pt 2023-11-27 11:49:52,720 INFO [checkpoint.py:131] (0/4) Loading averaged model 2023-11-27 11:49:52,850 INFO [train_asr.py:1370] (0/4) Setting the lr scale of parameters in encoder and encoder_embed to 1.0 2023-11-27 11:49:55,224 INFO [train_asr.py:1379] (0/4) Using DDP 2023-11-27 11:49:55,804 INFO [train_asr.py:1402] (0/4) Loading optimizer state dict 2023-11-27 11:49:56,432 INFO [train_asr.py:1410] (0/4) Loading scheduler state dict 2023-11-27 11:49:56,440 INFO [train_asr.py:1432] (0/4) Getting audioset cuts 2023-11-27 11:49:56,440 INFO [kd_datamodule.py:784] (0/4) About to get the audioset cuts. 2023-11-27 11:49:56,444 INFO [train_asr.py:1438] (0/4) Using mux to combine Librispeech with audioset 2023-11-27 11:49:56,444 INFO [train_asr.py:1449] (0/4) CutSet(len=2748469) [underlying data type: ] 2023-11-27 11:50:05,336 INFO [kd_datamodule.py:396] (0/4) Enable MUSAN 2023-11-27 11:50:05,336 INFO [kd_datamodule.py:397] (0/4) About to get Musan cuts 2023-11-27 11:50:08,041 INFO [kd_datamodule.py:427] (0/4) Enable SpecAugment 2023-11-27 11:50:08,041 INFO [kd_datamodule.py:428] (0/4) Time warp factor: 80 2023-11-27 11:50:08,041 INFO [kd_datamodule.py:438] (0/4) Num frame mask: 10 2023-11-27 11:50:08,041 INFO [kd_datamodule.py:451] (0/4) About to create train dataset 2023-11-27 11:50:08,045 INFO [kd_datamodule.py:487] (0/4) Using SimpleCutSampler 2023-11-27 11:50:08,045 INFO [kd_datamodule.py:495] (0/4) About to create train dataloader 2023-11-27 11:50:08,050 INFO [kd_datamodule.py:802] (0/4) About to get the audioset eval cuts. 2023-11-27 11:50:08,052 INFO [train_asr.py:1513] (0/4) CutSet(len=20681) [underlying data type: ] 2023-11-27 11:50:08,105 INFO [kd_datamodule.py:529] (0/4) About to create dev dataset 2023-11-27 11:50:08,536 INFO [kd_datamodule.py:550] (0/4) About to create dev dataloader 2023-11-27 11:50:08,537 INFO [train_asr.py:1527] (0/4) Loading grad scaler state dict 2023-11-27 11:50:28,611 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 0, loss[loss=0.07789, simple_loss=0.08696, pruned_loss=0.009115, audio_tagging_loss=0.02529, over 15170.00 frames. ], tot_loss[loss=0.07789, simple_loss=0.08696, pruned_loss=0.009115, audio_tagging_loss=0.02529, over 15170.00 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:50:28,613 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 11:50:42,136 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1468, 4.9344, 3.9152, 4.4065], device='cuda:0') 2023-11-27 11:50:45,241 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.5335, 4.4985, 4.2722, 4.3980], device='cuda:0') 2023-11-27 11:50:50,118 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9532, 3.1571, 2.8643, 3.1480, 3.3572, 2.8073, 3.4059, 2.6005], device='cuda:0') 2023-11-27 11:51:02,907 INFO [train_asr.py:1267] (0/4) Epoch 39, validation: loss=0.0578, simple_loss=0.05083, pruned_loss=0.005245, audio_tagging_loss=0.02714, over 4681554.00 frames. 2023-11-27 11:51:02,907 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 11:51:07,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3046020.0, ans=0.125 2023-11-27 11:51:11,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3046020.0, ans=0.125 2023-11-27 11:51:16,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3046086.6666666665, ans=0.0 2023-11-27 11:51:18,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3046086.6666666665, ans=0.2 2023-11-27 11:51:24,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3046086.6666666665, ans=0.125 2023-11-27 11:51:55,813 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 456950 2023-11-27 11:52:01,316 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 50, loss[loss=0.08385, simple_loss=0.114, pruned_loss=0.01599, audio_tagging_loss=0.01088, over 15523.00 frames. ], tot_loss[loss=0.07606, simple_loss=0.09222, pruned_loss=0.01273, audio_tagging_loss=0.01722, over 686606.58 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:52:01,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3046353.3333333335, ans=0.125 2023-11-27 11:52:25,571 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.249e+01 9.464e+01 1.034e+02 1.107e+02 1.312e+02, threshold=2.068e+02, percent-clipped=0.0 2023-11-27 11:52:39,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3046553.3333333335, ans=0.1 2023-11-27 11:52:47,077 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.70 vs. limit=12.0 2023-11-27 11:52:47,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3046620.0, ans=0.1 2023-11-27 11:52:54,176 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457000 2023-11-27 11:53:00,048 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 100, loss[loss=0.06795, simple_loss=0.0924, pruned_loss=0.0078, audio_tagging_loss=0.01395, over 14580.00 frames. ], tot_loss[loss=0.07544, simple_loss=0.09246, pruned_loss=0.01291, audio_tagging_loss=0.0163, over 1205874.49 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:53:23,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3046820.0, ans=0.2 2023-11-27 11:53:32,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3046886.6666666665, ans=0.0 2023-11-27 11:53:33,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3046886.6666666665, ans=0.2 2023-11-27 11:53:51,133 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457050 2023-11-27 11:53:53,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3046953.3333333335, ans=0.125 2023-11-27 11:53:56,556 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 150, loss[loss=0.082, simple_loss=0.1141, pruned_loss=0.01442, audio_tagging_loss=0.01052, over 14399.00 frames. ], tot_loss[loss=0.07201, simple_loss=0.08995, pruned_loss=0.01245, audio_tagging_loss=0.01459, over 1618580.66 frames. ], batch size: 52, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:54:06,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3047086.6666666665, ans=0.1 2023-11-27 11:54:13,792 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.54 vs. limit=22.5 2023-11-27 11:54:19,298 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.988e+01 9.589e+01 1.001e+02 1.163e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 11:54:47,790 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457100 2023-11-27 11:54:53,335 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 200, loss[loss=0.05979, simple_loss=0.08555, pruned_loss=0.009925, audio_tagging_loss=0.00709, over 16247.00 frames. ], tot_loss[loss=0.07041, simple_loss=0.09013, pruned_loss=0.01258, audio_tagging_loss=0.01276, over 1932448.81 frames. ], batch size: 62, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:54:57,296 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.18 vs. limit=15.0 2023-11-27 11:55:14,144 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-27 11:55:18,659 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.52 vs. limit=22.5 2023-11-27 11:55:28,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=3047553.3333333335, ans=12.0 2023-11-27 11:55:39,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3047620.0, ans=0.05 2023-11-27 11:55:45,186 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457150 2023-11-27 11:55:48,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3047620.0, ans=0.125 2023-11-27 11:55:51,213 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 250, loss[loss=0.06166, simple_loss=0.08316, pruned_loss=0.01104, audio_tagging_loss=0.009035, over 16100.00 frames. ], tot_loss[loss=0.06976, simple_loss=0.09087, pruned_loss=0.01277, audio_tagging_loss=0.01155, over 2177459.10 frames. ], batch size: 60, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:55:56,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3047686.6666666665, ans=0.0 2023-11-27 11:56:14,334 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.936e+01 9.538e+01 1.043e+02 1.286e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 11:56:14,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3047820.0, ans=0.0 2023-11-27 11:56:21,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3047820.0, ans=0.1 2023-11-27 11:56:30,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.55 vs. limit=15.0 2023-11-27 11:56:42,269 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457200 2023-11-27 11:56:48,617 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 300, loss[loss=0.07999, simple_loss=0.1082, pruned_loss=0.0207, audio_tagging_loss=0.005205, over 14989.00 frames. ], tot_loss[loss=0.06873, simple_loss=0.09061, pruned_loss=0.01269, audio_tagging_loss=0.01073, over 2372209.72 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 16.0 2023-11-27 11:57:05,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3048086.6666666665, ans=0.0 2023-11-27 11:57:19,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3048153.3333333335, ans=0.04949747468305833 2023-11-27 11:57:23,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3048220.0, ans=15.0 2023-11-27 11:57:30,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3048220.0, ans=0.2 2023-11-27 11:57:31,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3048220.0, ans=0.125 2023-11-27 11:57:34,409 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.89 vs. limit=15.0 2023-11-27 11:57:39,478 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457250 2023-11-27 11:57:43,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3048353.3333333335, ans=0.0 2023-11-27 11:57:44,919 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 350, loss[loss=0.07157, simple_loss=0.09522, pruned_loss=0.01566, audio_tagging_loss=0.0083, over 16026.00 frames. ], tot_loss[loss=0.06808, simple_loss=0.09019, pruned_loss=0.01275, audio_tagging_loss=0.01023, over 2523210.04 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 8.0 2023-11-27 11:58:06,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3048420.0, ans=0.05 2023-11-27 11:58:10,951 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 8.638e+01 9.224e+01 9.880e+01 1.297e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-27 11:58:14,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3048486.6666666665, ans=0.125 2023-11-27 11:58:24,669 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.48 vs. limit=15.0 2023-11-27 11:58:36,126 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457300 2023-11-27 11:58:42,190 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 400, loss[loss=0.08268, simple_loss=0.1153, pruned_loss=0.01838, audio_tagging_loss=0.006659, over 15643.00 frames. ], tot_loss[loss=0.06824, simple_loss=0.09107, pruned_loss=0.01287, audio_tagging_loss=0.009835, over 2640823.12 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 16.0 2023-11-27 11:58:48,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3048686.6666666665, ans=0.1 2023-11-27 11:59:17,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3048886.6666666665, ans=0.1 2023-11-27 11:59:21,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3048886.6666666665, ans=0.0 2023-11-27 11:59:32,754 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457350 2023-11-27 11:59:35,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3048953.3333333335, ans=0.125 2023-11-27 11:59:35,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3048953.3333333335, ans=0.0 2023-11-27 11:59:36,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3048953.3333333335, ans=0.2 2023-11-27 11:59:38,824 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 450, loss[loss=0.07328, simple_loss=0.0993, pruned_loss=0.01619, audio_tagging_loss=0.007438, over 14850.00 frames. ], tot_loss[loss=0.06773, simple_loss=0.09073, pruned_loss=0.01276, audio_tagging_loss=0.009601, over 2727001.52 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 16.0 2023-11-27 12:00:02,789 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.975e+01 8.448e+01 9.046e+01 9.688e+01 1.234e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-27 12:00:11,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3049220.0, ans=0.0 2023-11-27 12:00:14,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3049220.0, ans=0.0 2023-11-27 12:00:29,403 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457400 2023-11-27 12:00:35,324 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 500, loss[loss=0.06825, simple_loss=0.09602, pruned_loss=0.01224, audio_tagging_loss=0.008003, over 14748.00 frames. ], tot_loss[loss=0.06767, simple_loss=0.09097, pruned_loss=0.01284, audio_tagging_loss=0.009345, over 2802507.20 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:00:41,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3049353.3333333335, ans=0.125 2023-11-27 12:00:50,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3049420.0, ans=0.125 2023-11-27 12:00:59,168 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:01:08,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3049553.3333333335, ans=0.125 2023-11-27 12:01:10,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3049553.3333333335, ans=0.125 2023-11-27 12:01:21,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3049620.0, ans=0.2 2023-11-27 12:01:25,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3049620.0, ans=0.125 2023-11-27 12:01:25,884 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457450 2023-11-27 12:01:32,372 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 550, loss[loss=0.065, simple_loss=0.09232, pruned_loss=0.01058, audio_tagging_loss=0.008259, over 14507.00 frames. ], tot_loss[loss=0.06769, simple_loss=0.09118, pruned_loss=0.01288, audio_tagging_loss=0.009225, over 2857163.85 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:01:32,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3049686.6666666665, ans=0.125 2023-11-27 12:01:52,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3049753.3333333335, ans=0.1 2023-11-27 12:01:56,928 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.005e+01 8.636e+01 9.349e+01 1.005e+02 1.321e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-27 12:02:00,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3049820.0, ans=0.125 2023-11-27 12:02:23,334 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457500 2023-11-27 12:02:28,715 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 600, loss[loss=0.07075, simple_loss=0.09364, pruned_loss=0.01449, audio_tagging_loss=0.009429, over 14755.00 frames. ], tot_loss[loss=0.06798, simple_loss=0.09156, pruned_loss=0.01299, audio_tagging_loss=0.009214, over 2903554.15 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:02:35,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3050020.0, ans=0.125 2023-11-27 12:02:42,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3050086.6666666665, ans=0.125 2023-11-27 12:02:49,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3050086.6666666665, ans=0.2 2023-11-27 12:03:20,409 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457550 2023-11-27 12:03:25,691 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 650, loss[loss=0.05597, simple_loss=0.07717, pruned_loss=0.01008, audio_tagging_loss=0.007304, over 14324.00 frames. ], tot_loss[loss=0.06765, simple_loss=0.0914, pruned_loss=0.01288, audio_tagging_loss=0.009064, over 2929561.86 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:03:38,586 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2023-11-27 12:03:41,601 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.80 vs. limit=8.0 2023-11-27 12:03:49,543 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.00 vs. limit=15.0 2023-11-27 12:03:52,312 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.590e+01 8.673e+01 9.304e+01 1.013e+02 1.216e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 12:04:00,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3050553.3333333335, ans=0.0 2023-11-27 12:04:07,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3050553.3333333335, ans=0.0 2023-11-27 12:04:13,185 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.94 vs. limit=15.0 2023-11-27 12:04:17,025 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457600 2023-11-27 12:04:22,940 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 700, loss[loss=0.07225, simple_loss=0.1011, pruned_loss=0.01282, audio_tagging_loss=0.008852, over 16527.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.0907, pruned_loss=0.01275, audio_tagging_loss=0.009052, over 2961079.00 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:04:54,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3050820.0, ans=0.1 2023-11-27 12:05:02,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3050886.6666666665, ans=0.125 2023-11-27 12:05:08,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3050953.3333333335, ans=0.125 2023-11-27 12:05:15,170 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457650 2023-11-27 12:05:20,553 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 750, loss[loss=0.06214, simple_loss=0.08579, pruned_loss=0.01055, audio_tagging_loss=0.008699, over 15854.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09008, pruned_loss=0.01276, audio_tagging_loss=0.009069, over 2985279.35 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:05:36,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3051086.6666666665, ans=0.1 2023-11-27 12:05:46,326 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 8.728e+01 9.485e+01 1.039e+02 1.301e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-27 12:05:54,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3051220.0, ans=0.1 2023-11-27 12:05:58,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2023-11-27 12:06:11,700 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457700 2023-11-27 12:06:18,074 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 800, loss[loss=0.07101, simple_loss=0.0953, pruned_loss=0.01408, audio_tagging_loss=0.009283, over 14442.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.0904, pruned_loss=0.01285, audio_tagging_loss=0.009072, over 2998126.38 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:06:18,682 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.40 vs. limit=15.0 2023-11-27 12:06:21,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3051353.3333333335, ans=0.0 2023-11-27 12:06:33,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3051420.0, ans=0.1 2023-11-27 12:06:41,231 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=15.0 2023-11-27 12:06:46,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3051486.6666666665, ans=0.0 2023-11-27 12:06:52,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3051553.3333333335, ans=0.2 2023-11-27 12:07:09,789 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457750 2023-11-27 12:07:15,110 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 850, loss[loss=0.07482, simple_loss=0.1076, pruned_loss=0.01174, audio_tagging_loss=0.009279, over 15585.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09007, pruned_loss=0.01267, audio_tagging_loss=0.009105, over 3005604.34 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:07:21,393 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.14 vs. limit=22.5 2023-11-27 12:07:41,540 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.800e+01 8.486e+01 9.072e+01 9.874e+01 1.616e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-27 12:07:43,070 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2023-11-27 12:07:46,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=22.5 2023-11-27 12:08:03,439 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.28 vs. limit=15.0 2023-11-27 12:08:08,066 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457800 2023-11-27 12:08:14,042 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 900, loss[loss=0.05134, simple_loss=0.06488, pruned_loss=0.009691, audio_tagging_loss=0.009206, over 14423.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.08954, pruned_loss=0.01261, audio_tagging_loss=0.009322, over 3021991.70 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:08:28,758 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.35 vs. limit=10.0 2023-11-27 12:08:31,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3052086.6666666665, ans=0.125 2023-11-27 12:08:32,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2023-11-27 12:08:41,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3052153.3333333335, ans=0.0 2023-11-27 12:08:46,039 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2023-11-27 12:08:57,146 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:08:58,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3052220.0, ans=0.125 2023-11-27 12:09:05,802 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457850 2023-11-27 12:09:11,258 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 950, loss[loss=0.1049, simple_loss=0.1451, pruned_loss=0.02615, audio_tagging_loss=0.006197, over 14677.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.0903, pruned_loss=0.0128, audio_tagging_loss=0.009187, over 3032557.09 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:09:13,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3052353.3333333335, ans=0.125 2023-11-27 12:09:22,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3052420.0, ans=0.125 2023-11-27 12:09:38,363 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.523e+01 8.734e+01 9.231e+01 1.017e+02 1.316e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-27 12:09:38,669 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:10:03,004 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457900 2023-11-27 12:10:05,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3052620.0, ans=0.2 2023-11-27 12:10:08,372 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1000, loss[loss=0.06607, simple_loss=0.09162, pruned_loss=0.01134, audio_tagging_loss=0.008924, over 14864.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.0907, pruned_loss=0.0129, audio_tagging_loss=0.008981, over 3028059.60 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:10:29,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3052753.3333333335, ans=0.0 2023-11-27 12:10:34,297 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:10:34,591 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:10:36,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3052820.0, ans=0.0 2023-11-27 12:10:44,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3052886.6666666665, ans=0.1 2023-11-27 12:10:51,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3052886.6666666665, ans=0.1 2023-11-27 12:10:59,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3052953.3333333335, ans=0.125 2023-11-27 12:11:00,124 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457950 2023-11-27 12:11:05,654 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1050, loss[loss=0.07097, simple_loss=0.09812, pruned_loss=0.01424, audio_tagging_loss=0.007669, over 15893.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09101, pruned_loss=0.01292, audio_tagging_loss=0.008845, over 3022985.32 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:11:27,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3053153.3333333335, ans=0.0 2023-11-27 12:11:32,618 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.119e+01 8.482e+01 9.051e+01 9.992e+01 1.169e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-27 12:11:35,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3053153.3333333335, ans=0.1 2023-11-27 12:11:35,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3053153.3333333335, ans=0.125 2023-11-27 12:11:45,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3053220.0, ans=0.125 2023-11-27 12:11:47,900 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:11:56,568 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458000 2023-11-27 12:11:58,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3053286.6666666665, ans=0.125 2023-11-27 12:12:02,649 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1100, loss[loss=0.0465, simple_loss=0.06238, pruned_loss=0.004366, audio_tagging_loss=0.01094, over 15478.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.0899, pruned_loss=0.01281, audio_tagging_loss=0.008815, over 3027764.04 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:12:07,056 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:12:19,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3053420.0, ans=0.0 2023-11-27 12:12:33,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3053486.6666666665, ans=0.125 2023-11-27 12:12:47,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3053620.0, ans=0.1 2023-11-27 12:12:54,107 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458050 2023-11-27 12:12:59,558 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1150, loss[loss=0.08625, simple_loss=0.1254, pruned_loss=0.01801, audio_tagging_loss=0.005564, over 14691.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09061, pruned_loss=0.01283, audio_tagging_loss=0.008672, over 3029951.02 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:13:01,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3053686.6666666665, ans=0.1 2023-11-27 12:13:04,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3053686.6666666665, ans=0.04949747468305833 2023-11-27 12:13:08,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.68 vs. limit=22.5 2023-11-27 12:13:27,296 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.562e+01 9.025e+01 9.989e+01 1.460e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-27 12:13:28,052 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=15.0 2023-11-27 12:13:32,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3053886.6666666665, ans=0.1 2023-11-27 12:13:48,120 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.84 vs. limit=6.0 2023-11-27 12:13:50,840 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458100 2023-11-27 12:13:57,004 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1200, loss[loss=0.0712, simple_loss=0.1002, pruned_loss=0.01353, audio_tagging_loss=0.007583, over 15501.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09108, pruned_loss=0.01294, audio_tagging_loss=0.0086, over 3028804.37 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:14:22,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3054153.3333333335, ans=0.0 2023-11-27 12:14:46,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3054286.6666666665, ans=0.1 2023-11-27 12:14:47,938 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458150 2023-11-27 12:14:53,239 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1250, loss[loss=0.0692, simple_loss=0.09955, pruned_loss=0.01096, audio_tagging_loss=0.008465, over 15453.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09142, pruned_loss=0.01292, audio_tagging_loss=0.008606, over 3034560.40 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:14:56,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3054353.3333333335, ans=0.125 2023-11-27 12:15:10,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=3054420.0, ans=0.1 2023-11-27 12:15:21,005 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.926e+01 8.636e+01 9.164e+01 9.922e+01 1.354e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-27 12:15:44,085 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458200 2023-11-27 12:15:50,097 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1300, loss[loss=0.06114, simple_loss=0.08321, pruned_loss=0.01016, audio_tagging_loss=0.009368, over 16120.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09074, pruned_loss=0.01277, audio_tagging_loss=0.008728, over 3036802.39 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:16:12,624 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.95 vs. limit=15.0 2023-11-27 12:16:23,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3054886.6666666665, ans=0.0 2023-11-27 12:16:26,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3054886.6666666665, ans=0.125 2023-11-27 12:16:26,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3054886.6666666665, ans=0.0 2023-11-27 12:16:29,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3054886.6666666665, ans=0.0 2023-11-27 12:16:40,689 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458250 2023-11-27 12:16:45,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3054953.3333333335, ans=0.2 2023-11-27 12:16:46,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3055020.0, ans=0.125 2023-11-27 12:16:46,974 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1350, loss[loss=0.05853, simple_loss=0.06671, pruned_loss=0.01356, audio_tagging_loss=0.01162, over 13921.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09026, pruned_loss=0.01258, audio_tagging_loss=0.008674, over 3036016.43 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:16:51,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3055020.0, ans=0.125 2023-11-27 12:16:54,091 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.61 vs. limit=15.0 2023-11-27 12:16:56,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3055020.0, ans=0.125 2023-11-27 12:17:13,428 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.626e+01 9.162e+01 9.999e+01 1.247e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 12:17:18,392 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.15 vs. limit=10.0 2023-11-27 12:17:20,576 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.58 vs. limit=15.0 2023-11-27 12:17:31,797 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:17:32,403 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.00 vs. limit=22.5 2023-11-27 12:17:32,617 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.70 vs. limit=12.0 2023-11-27 12:17:38,525 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458300 2023-11-27 12:17:43,923 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1400, loss[loss=0.07379, simple_loss=0.09305, pruned_loss=0.01715, audio_tagging_loss=0.01011, over 14677.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09066, pruned_loss=0.01285, audio_tagging_loss=0.008776, over 3042286.45 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:18:01,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3055420.0, ans=0.1 2023-11-27 12:18:19,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3055553.3333333335, ans=0.5 2023-11-27 12:18:20,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3055553.3333333335, ans=0.1 2023-11-27 12:18:23,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3055553.3333333335, ans=0.025 2023-11-27 12:18:35,141 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458350 2023-11-27 12:18:37,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3055620.0, ans=0.125 2023-11-27 12:18:40,526 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1450, loss[loss=0.07839, simple_loss=0.1117, pruned_loss=0.01521, audio_tagging_loss=0.007344, over 15319.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09044, pruned_loss=0.01287, audio_tagging_loss=0.008894, over 3047342.74 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:19:08,519 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.606e+01 9.355e+01 1.005e+02 1.686e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 12:19:14,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3055886.6666666665, ans=0.2 2023-11-27 12:19:21,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3055886.6666666665, ans=0.125 2023-11-27 12:19:31,290 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458400 2023-11-27 12:19:33,531 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2023-11-27 12:19:37,791 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1500, loss[loss=0.06929, simple_loss=0.09034, pruned_loss=0.0134, audio_tagging_loss=0.01072, over 15969.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09062, pruned_loss=0.01293, audio_tagging_loss=0.008921, over 3044625.59 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:20:00,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3056153.3333333335, ans=0.1 2023-11-27 12:20:01,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3056153.3333333335, ans=0.125 2023-11-27 12:20:02,449 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2023-11-27 12:20:30,037 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458450 2023-11-27 12:20:35,426 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1550, loss[loss=0.08569, simple_loss=0.1107, pruned_loss=0.02383, audio_tagging_loss=0.006512, over 15177.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.08982, pruned_loss=0.01274, audio_tagging_loss=0.008944, over 3039811.98 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:20:59,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3056486.6666666665, ans=0.1 2023-11-27 12:21:01,720 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.639e+01 9.120e+01 9.883e+01 1.538e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-27 12:21:26,560 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458500 2023-11-27 12:21:26,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3056620.0, ans=0.2 2023-11-27 12:21:31,974 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1600, loss[loss=0.06644, simple_loss=0.08362, pruned_loss=0.01415, audio_tagging_loss=0.01047, over 16102.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.08984, pruned_loss=0.01263, audio_tagging_loss=0.008979, over 3039749.16 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:21:34,922 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-27 12:21:46,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3056753.3333333335, ans=0.0 2023-11-27 12:22:07,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3056886.6666666665, ans=0.5 2023-11-27 12:22:14,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3056886.6666666665, ans=0.125 2023-11-27 12:22:23,409 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458550 2023-11-27 12:22:26,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3056953.3333333335, ans=0.1 2023-11-27 12:22:28,723 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1650, loss[loss=0.06557, simple_loss=0.08796, pruned_loss=0.01092, audio_tagging_loss=0.01067, over 14672.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.0902, pruned_loss=0.01276, audio_tagging_loss=0.008963, over 3036582.56 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:22:34,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3057020.0, ans=0.1 2023-11-27 12:22:54,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3057153.3333333335, ans=15.0 2023-11-27 12:22:57,416 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 8.745e+01 9.326e+01 9.935e+01 1.288e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 12:23:14,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3057286.6666666665, ans=0.05 2023-11-27 12:23:22,286 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458600 2023-11-27 12:23:29,015 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1700, loss[loss=0.06039, simple_loss=0.08131, pruned_loss=0.01191, audio_tagging_loss=0.00782, over 15402.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.0904, pruned_loss=0.01278, audio_tagging_loss=0.008964, over 3035303.18 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:23:47,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3057420.0, ans=0.125 2023-11-27 12:23:50,276 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=15.0 2023-11-27 12:23:56,471 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:24:20,656 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458650 2023-11-27 12:24:26,150 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1750, loss[loss=0.08161, simple_loss=0.1155, pruned_loss=0.01591, audio_tagging_loss=0.007922, over 15757.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.0901, pruned_loss=0.01275, audio_tagging_loss=0.008926, over 3041714.12 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:24:28,764 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2023-11-27 12:24:33,454 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.69 vs. limit=15.0 2023-11-27 12:24:35,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3057686.6666666665, ans=0.125 2023-11-27 12:24:43,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3057753.3333333335, ans=0.0 2023-11-27 12:24:54,375 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.758e+01 8.466e+01 9.121e+01 9.743e+01 1.211e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-27 12:25:17,849 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458700 2023-11-27 12:25:23,181 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1800, loss[loss=0.07022, simple_loss=0.08795, pruned_loss=0.01599, audio_tagging_loss=0.01026, over 13598.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09025, pruned_loss=0.01258, audio_tagging_loss=0.008796, over 3045123.48 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:25:42,431 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2023-11-27 12:25:44,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3058086.6666666665, ans=0.1 2023-11-27 12:25:46,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3058153.3333333335, ans=0.09899494936611666 2023-11-27 12:26:07,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3058220.0, ans=0.125 2023-11-27 12:26:16,816 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458750 2023-11-27 12:26:18,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2023-11-27 12:26:22,261 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1850, loss[loss=0.069, simple_loss=0.09052, pruned_loss=0.01221, audio_tagging_loss=0.01153, over 14623.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08941, pruned_loss=0.01243, audio_tagging_loss=0.008825, over 3038275.48 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:26:32,481 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.54 vs. limit=15.0 2023-11-27 12:26:50,668 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.344e+01 8.879e+01 9.441e+01 1.010e+02 1.422e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-27 12:27:01,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3058553.3333333335, ans=0.2 2023-11-27 12:27:13,959 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458800 2023-11-27 12:27:19,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3058686.6666666665, ans=0.2 2023-11-27 12:27:20,585 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1900, loss[loss=0.0652, simple_loss=0.08587, pruned_loss=0.01128, audio_tagging_loss=0.01099, over 16640.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08997, pruned_loss=0.01253, audio_tagging_loss=0.008757, over 3037921.84 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:27:22,138 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.71 vs. limit=15.0 2023-11-27 12:27:23,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3058686.6666666665, ans=0.2 2023-11-27 12:27:26,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3058686.6666666665, ans=0.125 2023-11-27 12:27:29,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3058686.6666666665, ans=0.1 2023-11-27 12:27:48,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3058820.0, ans=0.125 2023-11-27 12:27:56,031 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.34 vs. limit=15.0 2023-11-27 12:28:02,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=3058886.6666666665, ans=0.2 2023-11-27 12:28:05,240 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.54 vs. limit=10.0 2023-11-27 12:28:12,458 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458850 2023-11-27 12:28:17,861 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1950, loss[loss=0.06594, simple_loss=0.09705, pruned_loss=0.01012, audio_tagging_loss=0.007299, over 15552.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08981, pruned_loss=0.01244, audio_tagging_loss=0.00875, over 3037659.63 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:28:19,673 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.30 vs. limit=22.5 2023-11-27 12:28:43,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3059153.3333333335, ans=0.0 2023-11-27 12:28:44,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2023-11-27 12:28:46,478 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 8.494e+01 8.981e+01 9.864e+01 1.306e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-27 12:29:09,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3059286.6666666665, ans=0.125 2023-11-27 12:29:11,151 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458900 2023-11-27 12:29:16,532 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2000, loss[loss=0.06601, simple_loss=0.0958, pruned_loss=0.00949, audio_tagging_loss=0.008625, over 14922.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09008, pruned_loss=0.01261, audio_tagging_loss=0.008729, over 3041404.26 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:29:19,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3059353.3333333335, ans=0.2 2023-11-27 12:30:08,163 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458950 2023-11-27 12:30:11,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3059620.0, ans=0.1 2023-11-27 12:30:13,702 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2050, loss[loss=0.06222, simple_loss=0.08791, pruned_loss=0.01002, audio_tagging_loss=0.008239, over 15207.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09091, pruned_loss=0.01269, audio_tagging_loss=0.008655, over 3041131.32 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:30:28,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3059753.3333333335, ans=0.125 2023-11-27 12:30:30,320 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.81 vs. limit=10.0 2023-11-27 12:30:35,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3059820.0, ans=0.125 2023-11-27 12:30:43,241 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 8.669e+01 9.305e+01 9.867e+01 1.531e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 12:30:56,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3059886.6666666665, ans=0.0 2023-11-27 12:31:06,133 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459000 2023-11-27 12:31:12,188 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2100, loss[loss=0.06168, simple_loss=0.07488, pruned_loss=0.01448, audio_tagging_loss=0.009762, over 14444.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09012, pruned_loss=0.0126, audio_tagging_loss=0.00866, over 3038674.82 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:31:13,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3060020.0, ans=0.1 2023-11-27 12:31:35,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3060153.3333333335, ans=0.0 2023-11-27 12:31:39,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3060153.3333333335, ans=0.125 2023-11-27 12:31:44,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3060153.3333333335, ans=0.125 2023-11-27 12:32:04,019 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459050 2023-11-27 12:32:10,923 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2150, loss[loss=0.0539, simple_loss=0.0747, pruned_loss=0.006423, audio_tagging_loss=0.01013, over 15819.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09004, pruned_loss=0.01268, audio_tagging_loss=0.008639, over 3036327.77 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:32:13,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3060353.3333333335, ans=0.0 2023-11-27 12:32:13,766 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2023-11-27 12:32:20,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3060353.3333333335, ans=0.1 2023-11-27 12:32:25,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3060420.0, ans=0.1 2023-11-27 12:32:39,028 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.225e+01 8.486e+01 9.121e+01 9.969e+01 1.223e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-27 12:32:41,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3060486.6666666665, ans=0.04949747468305833 2023-11-27 12:32:47,746 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:32:54,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3060553.3333333335, ans=0.015 2023-11-27 12:33:02,295 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459100 2023-11-27 12:33:07,663 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2200, loss[loss=0.06309, simple_loss=0.08253, pruned_loss=0.01133, audio_tagging_loss=0.0105, over 14744.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09105, pruned_loss=0.01289, audio_tagging_loss=0.008644, over 3039964.43 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:33:12,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3060686.6666666665, ans=0.1 2023-11-27 12:33:16,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3060686.6666666665, ans=0.125 2023-11-27 12:33:26,254 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.25 vs. limit=22.5 2023-11-27 12:33:34,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3060820.0, ans=0.2 2023-11-27 12:33:43,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3060886.6666666665, ans=0.125 2023-11-27 12:33:49,065 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.64 vs. limit=15.0 2023-11-27 12:33:57,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3060953.3333333335, ans=0.0 2023-11-27 12:33:59,424 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459150 2023-11-27 12:33:59,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3060953.3333333335, ans=0.1 2023-11-27 12:34:05,056 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.77 vs. limit=15.0 2023-11-27 12:34:05,526 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2250, loss[loss=0.06439, simple_loss=0.08759, pruned_loss=0.01302, audio_tagging_loss=0.007567, over 15672.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.0918, pruned_loss=0.01303, audio_tagging_loss=0.008656, over 3036554.95 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:34:13,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3061020.0, ans=0.1 2023-11-27 12:34:15,780 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:34:19,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3061086.6666666665, ans=0.1 2023-11-27 12:34:36,085 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.598e+01 9.189e+01 9.920e+01 1.225e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-27 12:34:46,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3061220.0, ans=0.2 2023-11-27 12:34:51,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3061286.6666666665, ans=0.0 2023-11-27 12:34:57,821 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459200 2023-11-27 12:35:05,897 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2300, loss[loss=0.08547, simple_loss=0.1167, pruned_loss=0.01895, audio_tagging_loss=0.008151, over 15512.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09153, pruned_loss=0.01287, audio_tagging_loss=0.008781, over 3037058.56 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:35:34,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3061486.6666666665, ans=0.2 2023-11-27 12:35:37,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3061486.6666666665, ans=0.125 2023-11-27 12:35:57,643 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459250 2023-11-27 12:35:58,181 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.63 vs. limit=10.0 2023-11-27 12:35:59,789 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:36:03,026 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2350, loss[loss=0.05198, simple_loss=0.07044, pruned_loss=0.005457, audio_tagging_loss=0.0113, over 14498.00 frames. ], tot_loss[loss=0.06756, simple_loss=0.09192, pruned_loss=0.01282, audio_tagging_loss=0.00878, over 3042402.99 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:36:09,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3061686.6666666665, ans=0.125 2023-11-27 12:36:31,258 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2023-11-27 12:36:34,192 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.011e+01 8.777e+01 9.429e+01 1.022e+02 1.253e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 12:36:37,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3061886.6666666665, ans=0.2 2023-11-27 12:36:47,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3061886.6666666665, ans=0.0 2023-11-27 12:36:55,302 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459300 2023-11-27 12:36:55,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2023-11-27 12:36:56,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3061953.3333333335, ans=0.0 2023-11-27 12:37:00,897 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2400, loss[loss=0.06627, simple_loss=0.09186, pruned_loss=0.01219, audio_tagging_loss=0.00815, over 16349.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09044, pruned_loss=0.0127, audio_tagging_loss=0.008963, over 3042027.22 frames. ], batch size: 64, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:37:10,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3062020.0, ans=0.0 2023-11-27 12:37:20,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3062086.6666666665, ans=0.125 2023-11-27 12:37:23,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3062086.6666666665, ans=0.125 2023-11-27 12:37:33,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3062153.3333333335, ans=0.0 2023-11-27 12:37:34,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3062153.3333333335, ans=0.1 2023-11-27 12:37:37,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3062220.0, ans=0.2 2023-11-27 12:37:41,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3062220.0, ans=0.025 2023-11-27 12:37:45,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3062220.0, ans=0.0 2023-11-27 12:37:53,006 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459350 2023-11-27 12:37:59,589 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2450, loss[loss=0.05129, simple_loss=0.06616, pruned_loss=0.008651, audio_tagging_loss=0.009559, over 15451.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09061, pruned_loss=0.01276, audio_tagging_loss=0.008998, over 3047051.35 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:38:12,017 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.95 vs. limit=15.0 2023-11-27 12:38:28,624 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.819e+01 8.321e+01 9.410e+01 9.969e+01 1.274e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 12:38:29,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3062486.6666666665, ans=0.1 2023-11-27 12:38:36,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3062553.3333333335, ans=0.0 2023-11-27 12:38:40,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3062553.3333333335, ans=15.0 2023-11-27 12:38:44,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3062620.0, ans=0.125 2023-11-27 12:38:49,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3062620.0, ans=0.125 2023-11-27 12:38:51,244 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459400 2023-11-27 12:38:55,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3062620.0, ans=0.125 2023-11-27 12:38:57,351 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2500, loss[loss=0.04685, simple_loss=0.05091, pruned_loss=0.006733, audio_tagging_loss=0.01466, over 13481.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09132, pruned_loss=0.0128, audio_tagging_loss=0.009001, over 3055863.54 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:39:00,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3062686.6666666665, ans=0.2 2023-11-27 12:39:06,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3062686.6666666665, ans=0.125 2023-11-27 12:39:15,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3062753.3333333335, ans=0.125 2023-11-27 12:39:24,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3062820.0, ans=0.95 2023-11-27 12:39:29,451 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=12.0 2023-11-27 12:39:31,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2023-11-27 12:39:43,208 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.61 vs. limit=15.0 2023-11-27 12:39:48,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3062953.3333333335, ans=0.0 2023-11-27 12:39:49,362 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459450 2023-11-27 12:39:54,751 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2550, loss[loss=0.06502, simple_loss=0.08569, pruned_loss=0.01374, audio_tagging_loss=0.008444, over 15284.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09053, pruned_loss=0.01274, audio_tagging_loss=0.009004, over 3049333.69 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:40:05,197 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=15.0 2023-11-27 12:40:15,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3063086.6666666665, ans=0.1 2023-11-27 12:40:23,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3063153.3333333335, ans=0.1 2023-11-27 12:40:26,723 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.350e+01 8.678e+01 9.247e+01 1.003e+02 1.223e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 12:40:46,540 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459500 2023-11-27 12:40:51,926 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2600, loss[loss=0.04434, simple_loss=0.0608, pruned_loss=0.00556, audio_tagging_loss=0.008385, over 15288.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08964, pruned_loss=0.01258, audio_tagging_loss=0.008909, over 3054497.62 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:40:59,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3063353.3333333335, ans=0.0 2023-11-27 12:40:59,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3063353.3333333335, ans=0.125 2023-11-27 12:41:04,212 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=22.5 2023-11-27 12:41:33,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3063553.3333333335, ans=0.2 2023-11-27 12:41:42,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3063620.0, ans=0.1 2023-11-27 12:41:45,371 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459550 2023-11-27 12:41:46,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3063620.0, ans=0.125 2023-11-27 12:41:50,870 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2650, loss[loss=0.06367, simple_loss=0.08159, pruned_loss=0.01093, audio_tagging_loss=0.01195, over 14694.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08898, pruned_loss=0.01243, audio_tagging_loss=0.008877, over 3052591.67 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:42:03,576 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.24 vs. limit=12.0 2023-11-27 12:42:15,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3063820.0, ans=0.125 2023-11-27 12:42:20,388 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.348e+01 9.301e+01 1.026e+02 1.495e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-27 12:42:42,213 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459600 2023-11-27 12:42:44,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.06 vs. limit=22.5 2023-11-27 12:42:48,208 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2700, loss[loss=0.06374, simple_loss=0.08383, pruned_loss=0.0147, audio_tagging_loss=0.007125, over 15831.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08927, pruned_loss=0.01242, audio_tagging_loss=0.008834, over 3055407.89 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:42:53,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3064020.0, ans=0.0 2023-11-27 12:42:53,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3064020.0, ans=0.2 2023-11-27 12:43:23,762 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:43:29,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3064220.0, ans=0.125 2023-11-27 12:43:39,933 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459650 2023-11-27 12:43:40,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3064286.6666666665, ans=0.05 2023-11-27 12:43:45,275 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2750, loss[loss=0.06262, simple_loss=0.07933, pruned_loss=0.01266, audio_tagging_loss=0.0103, over 15803.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08879, pruned_loss=0.01226, audio_tagging_loss=0.008824, over 3054573.93 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:43:47,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3064353.3333333335, ans=0.0 2023-11-27 12:44:05,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3064420.0, ans=0.05 2023-11-27 12:44:17,403 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.367e+01 8.947e+01 9.823e+01 1.478e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-27 12:44:30,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=15.0 2023-11-27 12:44:38,968 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:44:39,016 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459700 2023-11-27 12:44:44,994 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2800, loss[loss=0.06659, simple_loss=0.09218, pruned_loss=0.0101, audio_tagging_loss=0.01041, over 15443.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08945, pruned_loss=0.01227, audio_tagging_loss=0.008793, over 3063223.76 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:44:59,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3064753.3333333335, ans=0.125 2023-11-27 12:45:00,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3064753.3333333335, ans=0.025 2023-11-27 12:45:05,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3064820.0, ans=0.1 2023-11-27 12:45:09,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3064820.0, ans=0.125 2023-11-27 12:45:17,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3064886.6666666665, ans=0.125 2023-11-27 12:45:36,836 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459750 2023-11-27 12:45:36,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3064953.3333333335, ans=0.0 2023-11-27 12:45:39,352 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.21 vs. limit=15.0 2023-11-27 12:45:40,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3064953.3333333335, ans=0.0 2023-11-27 12:45:41,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3065020.0, ans=0.125 2023-11-27 12:45:42,194 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2850, loss[loss=0.04944, simple_loss=0.06633, pruned_loss=0.007619, audio_tagging_loss=0.008654, over 15245.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08846, pruned_loss=0.01204, audio_tagging_loss=0.008935, over 3052081.46 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:45:42,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3065020.0, ans=0.125 2023-11-27 12:46:12,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3065153.3333333335, ans=0.2 2023-11-27 12:46:14,415 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.057e+01 8.447e+01 9.117e+01 9.906e+01 1.324e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-27 12:46:20,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3065220.0, ans=0.0 2023-11-27 12:46:31,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3065286.6666666665, ans=0.2 2023-11-27 12:46:34,175 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459800 2023-11-27 12:46:40,263 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2900, loss[loss=0.05689, simple_loss=0.07825, pruned_loss=0.009313, audio_tagging_loss=0.008453, over 14633.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08944, pruned_loss=0.01233, audio_tagging_loss=0.008777, over 3048405.21 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:46:42,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3065353.3333333335, ans=0.125 2023-11-27 12:46:56,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3065420.0, ans=0.09899494936611666 2023-11-27 12:47:07,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3065486.6666666665, ans=0.2 2023-11-27 12:47:33,604 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459850 2023-11-27 12:47:39,700 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2950, loss[loss=0.08057, simple_loss=0.09495, pruned_loss=0.02311, audio_tagging_loss=0.009987, over 14650.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09013, pruned_loss=0.01258, audio_tagging_loss=0.008779, over 3052493.17 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:47:46,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3065686.6666666665, ans=0.1 2023-11-27 12:47:51,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3065753.3333333335, ans=0.125 2023-11-27 12:47:52,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3065753.3333333335, ans=0.125 2023-11-27 12:48:04,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3065820.0, ans=0.0 2023-11-27 12:48:08,395 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.92 vs. limit=22.5 2023-11-27 12:48:10,988 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.284e+01 8.648e+01 9.263e+01 1.004e+02 1.641e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-27 12:48:16,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3065886.6666666665, ans=0.2 2023-11-27 12:48:27,529 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.95 vs. limit=15.0 2023-11-27 12:48:29,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2023-11-27 12:48:32,121 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459900 2023-11-27 12:48:37,509 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3000, loss[loss=0.05831, simple_loss=0.0813, pruned_loss=0.008529, audio_tagging_loss=0.009131, over 15388.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09026, pruned_loss=0.01254, audio_tagging_loss=0.008814, over 3054369.81 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:48:37,512 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 12:49:11,861 INFO [train_asr.py:1267] (0/4) Epoch 39, validation: loss=0.05767, simple_loss=0.05074, pruned_loss=0.005233, audio_tagging_loss=0.02707, over 4681554.00 frames. 2023-11-27 12:49:11,862 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 12:49:34,159 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2023-11-27 12:49:51,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3066220.0, ans=0.0 2023-11-27 12:50:05,472 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459950 2023-11-27 12:50:10,837 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3050, loss[loss=0.06646, simple_loss=0.09907, pruned_loss=0.008509, audio_tagging_loss=0.008413, over 15583.00 frames. ], tot_loss[loss=0.06749, simple_loss=0.09152, pruned_loss=0.01289, audio_tagging_loss=0.008843, over 3059436.74 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:50:30,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3066420.0, ans=0.125 2023-11-27 12:50:37,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3066486.6666666665, ans=0.1 2023-11-27 12:50:42,418 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.816e+01 9.327e+01 1.004e+02 1.225e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 12:50:45,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3066553.3333333335, ans=0.1 2023-11-27 12:50:47,061 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:50:48,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3066553.3333333335, ans=0.1 2023-11-27 12:51:03,730 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460000 2023-11-27 12:51:05,360 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-460000.pt 2023-11-27 12:51:11,796 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3100, loss[loss=0.07413, simple_loss=0.09871, pruned_loss=0.01627, audio_tagging_loss=0.008511, over 14919.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09063, pruned_loss=0.01276, audio_tagging_loss=0.009026, over 3054118.89 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:51:57,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3066953.3333333335, ans=0.125 2023-11-27 12:52:00,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3066953.3333333335, ans=0.0 2023-11-27 12:52:04,082 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460050 2023-11-27 12:52:09,585 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3150, loss[loss=0.05421, simple_loss=0.06702, pruned_loss=0.01032, audio_tagging_loss=0.01038, over 15278.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.0904, pruned_loss=0.01264, audio_tagging_loss=0.009115, over 3052251.33 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:52:21,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3067086.6666666665, ans=0.125 2023-11-27 12:52:42,631 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 8.556e+01 9.152e+01 9.802e+01 1.189e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-27 12:53:02,626 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460100 2023-11-27 12:53:08,851 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3200, loss[loss=0.05968, simple_loss=0.07548, pruned_loss=0.01132, audio_tagging_loss=0.01062, over 15965.00 frames. ], tot_loss[loss=0.06756, simple_loss=0.09118, pruned_loss=0.0128, audio_tagging_loss=0.009169, over 3055241.16 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:53:10,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3067353.3333333335, ans=0.2 2023-11-27 12:53:20,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3067420.0, ans=0.2 2023-11-27 12:53:20,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3067420.0, ans=0.125 2023-11-27 12:53:22,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2023-11-27 12:53:27,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3067420.0, ans=0.1 2023-11-27 12:53:28,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=22.5 2023-11-27 12:53:32,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3067486.6666666665, ans=0.125 2023-11-27 12:53:36,648 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.27 vs. limit=22.5 2023-11-27 12:53:55,184 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.97 vs. limit=15.0 2023-11-27 12:54:01,039 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460150 2023-11-27 12:54:03,868 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.51 vs. limit=10.0 2023-11-27 12:54:06,399 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3250, loss[loss=0.04406, simple_loss=0.0486, pruned_loss=0.004248, audio_tagging_loss=0.01551, over 15476.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09033, pruned_loss=0.01261, audio_tagging_loss=0.009192, over 3055171.27 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:54:09,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3067686.6666666665, ans=0.0 2023-11-27 12:54:25,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3067753.3333333335, ans=0.125 2023-11-27 12:54:39,783 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.649e+01 8.843e+01 9.410e+01 1.014e+02 1.334e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 12:54:58,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3067953.3333333335, ans=0.0 2023-11-27 12:54:59,307 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460200 2023-11-27 12:55:02,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3067953.3333333335, ans=0.125 2023-11-27 12:55:05,139 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3300, loss[loss=0.05789, simple_loss=0.07424, pruned_loss=0.01031, audio_tagging_loss=0.01046, over 15594.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09047, pruned_loss=0.01272, audio_tagging_loss=0.009278, over 3049117.50 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:55:13,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3068020.0, ans=0.0 2023-11-27 12:55:24,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3068086.6666666665, ans=0.125 2023-11-27 12:55:24,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3068086.6666666665, ans=0.0 2023-11-27 12:55:29,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3068153.3333333335, ans=0.125 2023-11-27 12:55:37,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3068153.3333333335, ans=0.125 2023-11-27 12:55:42,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3068220.0, ans=0.0 2023-11-27 12:55:50,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3068220.0, ans=0.05 2023-11-27 12:55:58,027 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460250 2023-11-27 12:56:04,632 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3350, loss[loss=0.07516, simple_loss=0.1049, pruned_loss=0.01221, audio_tagging_loss=0.0105, over 14806.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09029, pruned_loss=0.01254, audio_tagging_loss=0.009178, over 3051319.42 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:56:05,198 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2023-11-27 12:56:06,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3068353.3333333335, ans=0.125 2023-11-27 12:56:17,522 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.90 vs. limit=22.5 2023-11-27 12:56:36,191 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.408e+01 8.562e+01 9.285e+01 9.934e+01 1.474e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 12:56:40,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3068553.3333333335, ans=0.125 2023-11-27 12:56:56,889 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460300 2023-11-27 12:57:02,299 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3400, loss[loss=0.07302, simple_loss=0.101, pruned_loss=0.01694, audio_tagging_loss=0.005596, over 16110.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09076, pruned_loss=0.01249, audio_tagging_loss=0.009088, over 3051057.73 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:57:12,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3068686.6666666665, ans=0.125 2023-11-27 12:57:16,720 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.55 vs. limit=15.0 2023-11-27 12:57:25,737 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:57:41,503 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.89 vs. limit=22.5 2023-11-27 12:57:45,765 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.98 vs. limit=15.0 2023-11-27 12:57:51,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3068953.3333333335, ans=0.2 2023-11-27 12:57:54,078 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460350 2023-11-27 12:58:00,438 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3450, loss[loss=0.0619, simple_loss=0.08834, pruned_loss=0.01029, audio_tagging_loss=0.007442, over 15018.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09023, pruned_loss=0.01239, audio_tagging_loss=0.00898, over 3046184.88 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:58:07,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3069020.0, ans=0.125 2023-11-27 12:58:12,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3069086.6666666665, ans=0.95 2023-11-27 12:58:14,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3069086.6666666665, ans=0.2 2023-11-27 12:58:32,872 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.182e+01 8.564e+01 9.034e+01 9.987e+01 1.555e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-27 12:58:33,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3069153.3333333335, ans=0.0 2023-11-27 12:58:39,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3069220.0, ans=0.0 2023-11-27 12:58:46,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3069286.6666666665, ans=0.125 2023-11-27 12:58:49,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3069286.6666666665, ans=0.1 2023-11-27 12:58:52,146 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460400 2023-11-27 12:58:58,881 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3500, loss[loss=0.06532, simple_loss=0.08598, pruned_loss=0.01451, audio_tagging_loss=0.007821, over 14689.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09051, pruned_loss=0.01265, audio_tagging_loss=0.008866, over 3054613.70 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:59:30,608 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:59:51,334 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460450 2023-11-27 12:59:56,784 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3550, loss[loss=0.05291, simple_loss=0.06605, pruned_loss=0.01196, audio_tagging_loss=0.007923, over 14860.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.0904, pruned_loss=0.01273, audio_tagging_loss=0.008851, over 3050997.81 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:00:08,273 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2023-11-27 13:00:17,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3069753.3333333335, ans=0.125 2023-11-27 13:00:24,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3069820.0, ans=0.125 2023-11-27 13:00:30,929 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.573e+01 9.051e+01 9.738e+01 1.451e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-27 13:00:38,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3069886.6666666665, ans=0.2 2023-11-27 13:00:45,887 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2023-11-27 13:00:48,471 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460500 2023-11-27 13:00:54,034 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3600, loss[loss=0.05086, simple_loss=0.06509, pruned_loss=0.008808, audio_tagging_loss=0.009504, over 14034.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09021, pruned_loss=0.0128, audio_tagging_loss=0.008801, over 3043094.83 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:00:57,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.21 vs. limit=15.0 2023-11-27 13:00:59,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3070020.0, ans=0.2 2023-11-27 13:00:59,705 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.37 vs. limit=15.0 2023-11-27 13:01:19,024 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2023-11-27 13:01:37,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.80 vs. limit=22.5 2023-11-27 13:01:46,335 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460550 2023-11-27 13:01:52,375 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3650, loss[loss=0.08437, simple_loss=0.1143, pruned_loss=0.01998, audio_tagging_loss=0.007223, over 14842.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09072, pruned_loss=0.01284, audio_tagging_loss=0.008765, over 3048517.52 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:02:07,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3070420.0, ans=0.125 2023-11-27 13:02:08,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3070420.0, ans=0.0 2023-11-27 13:02:09,667 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=22.5 2023-11-27 13:02:25,688 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.646e+01 9.266e+01 1.020e+02 1.594e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-27 13:02:42,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3070620.0, ans=0.0 2023-11-27 13:02:45,659 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460600 2023-11-27 13:02:49,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3070620.0, ans=0.125 2023-11-27 13:02:51,399 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3700, loss[loss=0.05311, simple_loss=0.06593, pruned_loss=0.007863, audio_tagging_loss=0.01228, over 14323.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.0903, pruned_loss=0.01267, audio_tagging_loss=0.008768, over 3045350.34 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:03:12,054 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.25 vs. limit=15.0 2023-11-27 13:03:13,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3070820.0, ans=0.125 2023-11-27 13:03:43,363 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460650 2023-11-27 13:03:45,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3070953.3333333335, ans=0.07 2023-11-27 13:03:48,426 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2023-11-27 13:03:48,522 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.10 vs. limit=15.0 2023-11-27 13:03:48,834 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3750, loss[loss=0.06853, simple_loss=0.09612, pruned_loss=0.01257, audio_tagging_loss=0.007903, over 15819.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09109, pruned_loss=0.0127, audio_tagging_loss=0.008697, over 3046340.29 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:04:22,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3071153.3333333335, ans=0.125 2023-11-27 13:04:23,263 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.737e+01 9.313e+01 1.018e+02 1.236e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 13:04:31,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3071220.0, ans=0.0 2023-11-27 13:04:32,106 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:04:36,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3071286.6666666665, ans=0.0 2023-11-27 13:04:36,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3071286.6666666665, ans=0.125 2023-11-27 13:04:37,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3071286.6666666665, ans=0.125 2023-11-27 13:04:40,863 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460700 2023-11-27 13:04:43,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.94 vs. limit=22.5 2023-11-27 13:04:46,904 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3800, loss[loss=0.09137, simple_loss=0.1262, pruned_loss=0.02307, audio_tagging_loss=0.005176, over 15478.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09089, pruned_loss=0.01268, audio_tagging_loss=0.008636, over 3044490.99 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:04:47,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3071353.3333333335, ans=0.05 2023-11-27 13:04:53,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3071353.3333333335, ans=0.1 2023-11-27 13:05:15,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3071486.6666666665, ans=0.2 2023-11-27 13:05:21,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3071553.3333333335, ans=0.125 2023-11-27 13:05:40,411 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460750 2023-11-27 13:05:42,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3071620.0, ans=0.125 2023-11-27 13:05:45,987 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3850, loss[loss=0.07442, simple_loss=0.09913, pruned_loss=0.01925, audio_tagging_loss=0.00561, over 15155.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09056, pruned_loss=0.01278, audio_tagging_loss=0.008745, over 3046205.56 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:06:18,585 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 8.582e+01 9.212e+01 9.899e+01 1.418e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-27 13:06:24,544 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.51 vs. limit=15.0 2023-11-27 13:06:25,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3071886.6666666665, ans=0.125 2023-11-27 13:06:37,405 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460800 2023-11-27 13:06:43,159 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3900, loss[loss=0.06354, simple_loss=0.08969, pruned_loss=0.008271, audio_tagging_loss=0.01043, over 16279.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09013, pruned_loss=0.01274, audio_tagging_loss=0.008842, over 3042807.53 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:07:29,750 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=12.0 2023-11-27 13:07:30,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3072286.6666666665, ans=0.0 2023-11-27 13:07:34,958 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460850 2023-11-27 13:07:36,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=3072286.6666666665, ans=15.0 2023-11-27 13:07:40,307 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3950, loss[loss=0.06759, simple_loss=0.09202, pruned_loss=0.01104, audio_tagging_loss=0.01055, over 15034.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08921, pruned_loss=0.01254, audio_tagging_loss=0.008931, over 3037508.80 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:07:57,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.06 vs. limit=15.0 2023-11-27 13:08:00,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3072420.0, ans=0.0 2023-11-27 13:08:02,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3072420.0, ans=0.1 2023-11-27 13:08:15,680 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.964e+01 9.425e+01 1.000e+02 1.456e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-27 13:08:33,508 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460900 2023-11-27 13:08:39,769 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4000, loss[loss=0.05961, simple_loss=0.08437, pruned_loss=0.008116, audio_tagging_loss=0.009312, over 16335.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08965, pruned_loss=0.01262, audio_tagging_loss=0.008941, over 3039983.25 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:09:31,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3072953.3333333335, ans=0.0 2023-11-27 13:09:31,980 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460950 2023-11-27 13:09:37,355 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4050, loss[loss=0.05549, simple_loss=0.06677, pruned_loss=0.01105, audio_tagging_loss=0.01105, over 15914.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09017, pruned_loss=0.0128, audio_tagging_loss=0.008969, over 3040958.80 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:09:43,855 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:09:51,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3073086.6666666665, ans=0.125 2023-11-27 13:09:55,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3073086.6666666665, ans=0.125 2023-11-27 13:09:55,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3073086.6666666665, ans=0.125 2023-11-27 13:10:08,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3073153.3333333335, ans=0.0 2023-11-27 13:10:08,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3073153.3333333335, ans=0.95 2023-11-27 13:10:09,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3073153.3333333335, ans=0.0 2023-11-27 13:10:13,378 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.583e+01 8.613e+01 9.165e+01 1.034e+02 1.408e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-27 13:10:19,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3073220.0, ans=0.1 2023-11-27 13:10:28,787 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461000 2023-11-27 13:10:34,543 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4100, loss[loss=0.09616, simple_loss=0.1358, pruned_loss=0.01847, audio_tagging_loss=0.009784, over 16187.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09063, pruned_loss=0.01276, audio_tagging_loss=0.008957, over 3046055.67 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:10:45,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3073420.0, ans=0.0 2023-11-27 13:11:08,234 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.28 vs. limit=15.0 2023-11-27 13:11:15,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3073553.3333333335, ans=0.1 2023-11-27 13:11:22,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3073620.0, ans=0.125 2023-11-27 13:11:26,793 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461050 2023-11-27 13:11:32,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3073686.6666666665, ans=22.5 2023-11-27 13:11:33,385 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4150, loss[loss=0.06788, simple_loss=0.08646, pruned_loss=0.01303, audio_tagging_loss=0.01162, over 16133.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.0911, pruned_loss=0.0128, audio_tagging_loss=0.008857, over 3054021.47 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:11:51,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3073753.3333333335, ans=0.125 2023-11-27 13:11:54,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3073753.3333333335, ans=0.0 2023-11-27 13:11:54,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3073753.3333333335, ans=0.07 2023-11-27 13:11:56,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3073820.0, ans=0.125 2023-11-27 13:12:07,794 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.01 vs. limit=15.0 2023-11-27 13:12:08,196 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.828e+01 8.439e+01 9.039e+01 1.003e+02 1.334e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-27 13:12:08,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3073886.6666666665, ans=0.125 2023-11-27 13:12:18,767 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:12:25,955 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461100 2023-11-27 13:12:31,245 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4200, loss[loss=0.07501, simple_loss=0.1059, pruned_loss=0.01326, audio_tagging_loss=0.00878, over 15660.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09101, pruned_loss=0.0127, audio_tagging_loss=0.008749, over 3055060.05 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:12:36,170 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2023-11-27 13:12:56,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3074153.3333333335, ans=0.1 2023-11-27 13:12:58,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3074153.3333333335, ans=0.1 2023-11-27 13:13:06,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3074220.0, ans=0.1 2023-11-27 13:13:20,596 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.44 vs. limit=10.0 2023-11-27 13:13:22,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3074286.6666666665, ans=0.0 2023-11-27 13:13:23,396 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461150 2023-11-27 13:13:28,873 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4250, loss[loss=0.08455, simple_loss=0.1094, pruned_loss=0.02145, audio_tagging_loss=0.008419, over 15069.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09086, pruned_loss=0.01282, audio_tagging_loss=0.008758, over 3052213.38 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:13:57,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3074486.6666666665, ans=0.125 2023-11-27 13:14:06,426 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.652e+01 9.233e+01 9.893e+01 1.518e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-27 13:14:07,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3074553.3333333335, ans=0.125 2023-11-27 13:14:20,877 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461200 2023-11-27 13:14:27,649 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4300, loss[loss=0.06536, simple_loss=0.08812, pruned_loss=0.0124, audio_tagging_loss=0.008898, over 14412.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09122, pruned_loss=0.01292, audio_tagging_loss=0.008661, over 3053539.60 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:14:34,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3074686.6666666665, ans=0.1 2023-11-27 13:14:43,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3074753.3333333335, ans=0.0 2023-11-27 13:14:44,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3074753.3333333335, ans=0.0 2023-11-27 13:15:19,951 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461250 2023-11-27 13:15:20,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3074953.3333333335, ans=0.125 2023-11-27 13:15:25,870 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4350, loss[loss=0.05212, simple_loss=0.06725, pruned_loss=0.008439, audio_tagging_loss=0.01005, over 15696.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.09169, pruned_loss=0.01302, audio_tagging_loss=0.008706, over 3056991.45 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:16:00,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3075220.0, ans=0.125 2023-11-27 13:16:02,974 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.774e+01 9.357e+01 9.883e+01 1.317e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 13:16:12,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3075286.6666666665, ans=0.125 2023-11-27 13:16:13,123 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2023-11-27 13:16:18,144 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461300 2023-11-27 13:16:23,504 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4400, loss[loss=0.06932, simple_loss=0.0923, pruned_loss=0.01425, audio_tagging_loss=0.008924, over 15679.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09106, pruned_loss=0.01285, audio_tagging_loss=0.008743, over 3062445.94 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:16:25,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3075353.3333333335, ans=0.125 2023-11-27 13:16:53,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3075486.6666666665, ans=0.07 2023-11-27 13:17:05,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3075553.3333333335, ans=0.0 2023-11-27 13:17:12,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3075620.0, ans=0.0 2023-11-27 13:17:15,663 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461350 2023-11-27 13:17:21,483 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4450, loss[loss=0.06991, simple_loss=0.09456, pruned_loss=0.01309, audio_tagging_loss=0.009548, over 14381.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09049, pruned_loss=0.01265, audio_tagging_loss=0.00873, over 3059480.17 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:17:33,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3075753.3333333335, ans=0.125 2023-11-27 13:17:50,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3075820.0, ans=0.5 2023-11-27 13:17:58,565 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 8.640e+01 9.339e+01 1.014e+02 1.202e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-27 13:17:59,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3075886.6666666665, ans=0.125 2023-11-27 13:18:14,591 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461400 2023-11-27 13:18:19,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3076020.0, ans=0.0 2023-11-27 13:18:20,234 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4500, loss[loss=0.06926, simple_loss=0.09856, pruned_loss=0.01246, audio_tagging_loss=0.007524, over 14856.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09013, pruned_loss=0.01251, audio_tagging_loss=0.008711, over 3055966.38 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:18:44,610 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.04 vs. limit=22.5 2023-11-27 13:18:49,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3076153.3333333335, ans=0.0 2023-11-27 13:18:53,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3076220.0, ans=0.2 2023-11-27 13:19:04,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3076220.0, ans=0.0 2023-11-27 13:19:12,274 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461450 2023-11-27 13:19:17,599 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4550, loss[loss=0.05028, simple_loss=0.06184, pruned_loss=0.00897, audio_tagging_loss=0.01039, over 15271.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08998, pruned_loss=0.01249, audio_tagging_loss=0.008734, over 3053059.10 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:19:23,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3076353.3333333335, ans=0.125 2023-11-27 13:19:24,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.66 vs. limit=15.0 2023-11-27 13:19:25,836 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.79 vs. limit=22.5 2023-11-27 13:19:54,262 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.721e+01 8.672e+01 9.431e+01 1.029e+02 1.211e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 13:19:57,920 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.25 vs. limit=22.5 2023-11-27 13:20:04,933 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:20:09,218 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461500 2023-11-27 13:20:14,519 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4600, loss[loss=0.04526, simple_loss=0.05528, pruned_loss=0.007589, audio_tagging_loss=0.01003, over 14255.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08904, pruned_loss=0.01244, audio_tagging_loss=0.00882, over 3052819.38 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:20:20,361 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2023-11-27 13:20:49,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3076886.6666666665, ans=0.2 2023-11-27 13:20:56,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3076886.6666666665, ans=0.125 2023-11-27 13:20:56,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3076886.6666666665, ans=0.1 2023-11-27 13:21:08,144 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461550 2023-11-27 13:21:13,510 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4650, loss[loss=0.09086, simple_loss=0.1296, pruned_loss=0.01741, audio_tagging_loss=0.008634, over 15842.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08914, pruned_loss=0.01247, audio_tagging_loss=0.00886, over 3054813.35 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:21:16,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3077020.0, ans=0.0 2023-11-27 13:21:37,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3077153.3333333335, ans=0.125 2023-11-27 13:21:49,742 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.91 vs. limit=15.0 2023-11-27 13:21:49,995 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.494e+01 8.746e+01 9.217e+01 9.897e+01 1.196e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-27 13:21:55,681 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.69 vs. limit=6.0 2023-11-27 13:22:04,879 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461600 2023-11-27 13:22:07,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3077286.6666666665, ans=0.125 2023-11-27 13:22:10,672 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4700, loss[loss=0.07417, simple_loss=0.1078, pruned_loss=0.01388, audio_tagging_loss=0.006381, over 14771.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09049, pruned_loss=0.01276, audio_tagging_loss=0.008836, over 3053052.13 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:22:17,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3077353.3333333335, ans=0.125 2023-11-27 13:22:46,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3077553.3333333335, ans=0.125 2023-11-27 13:22:52,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3077553.3333333335, ans=0.125 2023-11-27 13:23:02,776 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461650 2023-11-27 13:23:08,074 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4750, loss[loss=0.05063, simple_loss=0.06299, pruned_loss=0.009294, audio_tagging_loss=0.009843, over 16069.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08963, pruned_loss=0.01249, audio_tagging_loss=0.008984, over 3043636.03 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:23:08,814 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=15.0 2023-11-27 13:23:12,187 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:23:36,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=3077820.0, ans=0.2 2023-11-27 13:23:37,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3077820.0, ans=0.2 2023-11-27 13:23:44,943 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=22.5 2023-11-27 13:23:45,249 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 8.817e+01 9.459e+01 1.019e+02 1.212e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-27 13:23:47,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3077886.6666666665, ans=0.1 2023-11-27 13:24:00,660 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461700 2023-11-27 13:24:06,649 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4800, loss[loss=0.06249, simple_loss=0.08539, pruned_loss=0.01103, audio_tagging_loss=0.008764, over 16539.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09004, pruned_loss=0.01264, audio_tagging_loss=0.009059, over 3052649.07 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:24:08,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3078020.0, ans=0.125 2023-11-27 13:24:12,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3078020.0, ans=0.125 2023-11-27 13:24:18,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3078086.6666666665, ans=0.125 2023-11-27 13:24:24,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3078086.6666666665, ans=0.125 2023-11-27 13:24:30,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3078153.3333333335, ans=0.2 2023-11-27 13:24:57,872 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461750 2023-11-27 13:24:58,286 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2023-11-27 13:25:03,194 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4850, loss[loss=0.06789, simple_loss=0.08718, pruned_loss=0.01379, audio_tagging_loss=0.01051, over 15221.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09032, pruned_loss=0.01268, audio_tagging_loss=0.009156, over 3053495.80 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:25:04,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3078353.3333333335, ans=0.2 2023-11-27 13:25:11,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3078353.3333333335, ans=0.125 2023-11-27 13:25:26,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3078486.6666666665, ans=0.125 2023-11-27 13:25:31,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3078486.6666666665, ans=0.125 2023-11-27 13:25:40,274 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.650e+01 9.327e+01 1.023e+02 1.195e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 13:25:54,479 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461800 2023-11-27 13:26:00,792 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4900, loss[loss=0.0606, simple_loss=0.08336, pruned_loss=0.01094, audio_tagging_loss=0.007987, over 14273.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09022, pruned_loss=0.01274, audio_tagging_loss=0.009104, over 3051975.68 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:26:17,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=15.0 2023-11-27 13:26:18,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3078753.3333333335, ans=0.125 2023-11-27 13:26:44,086 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.10 vs. limit=15.0 2023-11-27 13:26:46,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.78 vs. limit=22.5 2023-11-27 13:26:52,510 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461850 2023-11-27 13:26:53,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3078953.3333333335, ans=0.125 2023-11-27 13:26:58,536 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4950, loss[loss=0.05081, simple_loss=0.06714, pruned_loss=0.009236, audio_tagging_loss=0.008006, over 14155.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09062, pruned_loss=0.01269, audio_tagging_loss=0.008934, over 3046637.59 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:27:02,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3079020.0, ans=0.0 2023-11-27 13:27:10,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3079086.6666666665, ans=0.125 2023-11-27 13:27:16,650 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:27:18,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3079086.6666666665, ans=0.125 2023-11-27 13:27:28,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3079153.3333333335, ans=0.1 2023-11-27 13:27:34,978 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.277e+01 8.478e+01 9.080e+01 9.742e+01 1.240e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-27 13:27:40,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3079220.0, ans=0.0 2023-11-27 13:27:48,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3079286.6666666665, ans=0.125 2023-11-27 13:27:50,467 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461900 2023-11-27 13:27:55,869 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5000, loss[loss=0.05638, simple_loss=0.06978, pruned_loss=0.01002, audio_tagging_loss=0.01146, over 14719.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09071, pruned_loss=0.01274, audio_tagging_loss=0.008805, over 3048788.29 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:27:59,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3079353.3333333335, ans=0.07 2023-11-27 13:28:03,754 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:28:33,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3079553.3333333335, ans=0.0 2023-11-27 13:28:47,596 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461950 2023-11-27 13:28:52,930 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5050, loss[loss=0.05229, simple_loss=0.07186, pruned_loss=0.01017, audio_tagging_loss=0.006191, over 14796.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.0911, pruned_loss=0.01283, audio_tagging_loss=0.008711, over 3041769.24 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:29:01,777 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=22.5 2023-11-27 13:29:11,692 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.99 vs. limit=22.5 2023-11-27 13:29:14,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3079753.3333333335, ans=0.1 2023-11-27 13:29:23,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3079820.0, ans=0.07 2023-11-27 13:29:29,241 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.036e+01 8.853e+01 9.456e+01 1.016e+02 1.610e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-27 13:29:43,976 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462000 2023-11-27 13:29:50,858 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5100, loss[loss=0.06189, simple_loss=0.08296, pruned_loss=0.009012, audio_tagging_loss=0.01139, over 15681.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09025, pruned_loss=0.01262, audio_tagging_loss=0.008788, over 3047426.01 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:29:57,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3080020.0, ans=0.125 2023-11-27 13:30:00,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3080020.0, ans=0.1 2023-11-27 13:30:19,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3080153.3333333335, ans=0.125 2023-11-27 13:30:30,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3080220.0, ans=10.0 2023-11-27 13:30:42,804 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462050 2023-11-27 13:30:45,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3080286.6666666665, ans=0.2 2023-11-27 13:30:48,254 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5150, loss[loss=0.07237, simple_loss=0.08711, pruned_loss=0.01759, audio_tagging_loss=0.01123, over 15385.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09056, pruned_loss=0.01266, audio_tagging_loss=0.008756, over 3043350.30 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:31:09,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3080486.6666666665, ans=0.0 2023-11-27 13:31:22,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3080553.3333333335, ans=0.125 2023-11-27 13:31:24,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3080553.3333333335, ans=0.0 2023-11-27 13:31:24,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3080553.3333333335, ans=0.125 2023-11-27 13:31:26,840 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.025e+01 8.502e+01 9.223e+01 9.833e+01 1.321e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-27 13:31:39,966 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462100 2023-11-27 13:31:40,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3080620.0, ans=0.2 2023-11-27 13:31:44,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3080686.6666666665, ans=0.0 2023-11-27 13:31:45,312 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5200, loss[loss=0.05527, simple_loss=0.07199, pruned_loss=0.01133, audio_tagging_loss=0.007939, over 15850.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09158, pruned_loss=0.0127, audio_tagging_loss=0.00867, over 3048786.20 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:31:54,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.72 vs. limit=6.0 2023-11-27 13:31:57,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3080753.3333333335, ans=0.125 2023-11-27 13:32:01,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3080753.3333333335, ans=0.125 2023-11-27 13:32:11,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3080820.0, ans=0.0 2023-11-27 13:32:36,145 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462150 2023-11-27 13:32:42,056 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5250, loss[loss=0.05299, simple_loss=0.06384, pruned_loss=0.00833, audio_tagging_loss=0.01274, over 15034.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.0919, pruned_loss=0.01281, audio_tagging_loss=0.008709, over 3042952.67 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:32:43,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3081020.0, ans=0.125 2023-11-27 13:32:53,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3081086.6666666665, ans=0.0 2023-11-27 13:33:07,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.29 vs. limit=15.0 2023-11-27 13:33:20,208 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 8.787e+01 9.211e+01 1.001e+02 1.224e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-27 13:33:34,938 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462200 2023-11-27 13:33:40,179 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.54 vs. limit=22.5 2023-11-27 13:33:40,674 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5300, loss[loss=0.06812, simple_loss=0.09605, pruned_loss=0.01336, audio_tagging_loss=0.006738, over 14892.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.0921, pruned_loss=0.01277, audio_tagging_loss=0.008644, over 3041735.17 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:34:03,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3081486.6666666665, ans=0.0 2023-11-27 13:34:16,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3081553.3333333335, ans=0.0 2023-11-27 13:34:23,104 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.80 vs. limit=10.0 2023-11-27 13:34:29,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3081620.0, ans=0.125 2023-11-27 13:34:29,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3081620.0, ans=0.0 2023-11-27 13:34:32,362 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462250 2023-11-27 13:34:37,739 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5350, loss[loss=0.06639, simple_loss=0.08545, pruned_loss=0.01143, audio_tagging_loss=0.01225, over 15329.00 frames. ], tot_loss[loss=0.06772, simple_loss=0.09258, pruned_loss=0.0128, audio_tagging_loss=0.008627, over 3047473.57 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:34:52,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3081753.3333333335, ans=0.0 2023-11-27 13:34:57,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3081753.3333333335, ans=0.125 2023-11-27 13:35:11,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3081886.6666666665, ans=0.125 2023-11-27 13:35:11,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3081886.6666666665, ans=0.0 2023-11-27 13:35:16,881 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.312e+01 8.714e+01 9.357e+01 9.937e+01 1.176e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 13:35:23,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3081953.3333333335, ans=0.125 2023-11-27 13:35:24,218 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=22.5 2023-11-27 13:35:26,565 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.40 vs. limit=22.5 2023-11-27 13:35:29,105 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462300 2023-11-27 13:35:34,974 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5400, loss[loss=0.06813, simple_loss=0.08973, pruned_loss=0.0147, audio_tagging_loss=0.008559, over 14860.00 frames. ], tot_loss[loss=0.06837, simple_loss=0.09347, pruned_loss=0.01303, audio_tagging_loss=0.008601, over 3047759.19 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:35:38,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3082020.0, ans=0.125 2023-11-27 13:35:40,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3082020.0, ans=0.0 2023-11-27 13:35:54,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3082086.6666666665, ans=0.125 2023-11-27 13:36:12,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.68 vs. limit=15.0 2023-11-27 13:36:27,740 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462350 2023-11-27 13:36:33,146 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5450, loss[loss=0.05363, simple_loss=0.0651, pruned_loss=0.008969, audio_tagging_loss=0.01211, over 15155.00 frames. ], tot_loss[loss=0.06792, simple_loss=0.09251, pruned_loss=0.01296, audio_tagging_loss=0.008706, over 3047110.04 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:36:36,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3082353.3333333335, ans=0.0 2023-11-27 13:36:41,159 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.00 vs. limit=15.0 2023-11-27 13:36:53,230 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=12.0 2023-11-27 13:37:10,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3082553.3333333335, ans=0.125 2023-11-27 13:37:11,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3082553.3333333335, ans=0.125 2023-11-27 13:37:12,767 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.623e+01 9.313e+01 1.025e+02 1.327e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 13:37:15,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3082553.3333333335, ans=0.125 2023-11-27 13:37:24,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3082620.0, ans=0.125 2023-11-27 13:37:25,516 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462400 2023-11-27 13:37:31,091 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5500, loss[loss=0.06496, simple_loss=0.09402, pruned_loss=0.01134, audio_tagging_loss=0.006606, over 14911.00 frames. ], tot_loss[loss=0.06764, simple_loss=0.09198, pruned_loss=0.01289, audio_tagging_loss=0.008762, over 3048090.17 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:37:40,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3082686.6666666665, ans=0.125 2023-11-27 13:37:47,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.54 vs. limit=22.5 2023-11-27 13:37:50,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.46 vs. limit=10.0 2023-11-27 13:38:19,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3082953.3333333335, ans=0.0 2023-11-27 13:38:20,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3082953.3333333335, ans=0.125 2023-11-27 13:38:23,000 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462450 2023-11-27 13:38:28,408 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5550, loss[loss=0.0873, simple_loss=0.1115, pruned_loss=0.02154, audio_tagging_loss=0.01003, over 14802.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09138, pruned_loss=0.01289, audio_tagging_loss=0.008879, over 3046838.72 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:38:32,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3083020.0, ans=0.125 2023-11-27 13:38:53,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3083153.3333333335, ans=0.0 2023-11-27 13:39:04,321 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2023-11-27 13:39:09,187 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.682e+01 9.087e+01 1.004e+02 1.719e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-27 13:39:16,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3083286.6666666665, ans=0.05 2023-11-27 13:39:21,238 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462500 2023-11-27 13:39:27,223 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5600, loss[loss=0.06673, simple_loss=0.0972, pruned_loss=0.0099, audio_tagging_loss=0.008227, over 14544.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.09152, pruned_loss=0.01291, audio_tagging_loss=0.008907, over 3039472.50 frames. ], batch size: 52, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:39:43,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3083420.0, ans=0.0 2023-11-27 13:39:43,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.36 vs. limit=15.0 2023-11-27 13:39:44,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3083420.0, ans=0.0 2023-11-27 13:40:12,342 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:40:19,109 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462550 2023-11-27 13:40:25,097 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5650, loss[loss=0.06629, simple_loss=0.08634, pruned_loss=0.01282, audio_tagging_loss=0.0103, over 15081.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09117, pruned_loss=0.01294, audio_tagging_loss=0.008944, over 3045818.58 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:40:31,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3083686.6666666665, ans=0.125 2023-11-27 13:40:56,931 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:41:01,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3083886.6666666665, ans=0.0 2023-11-27 13:41:05,323 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.900e+01 8.523e+01 8.985e+01 9.871e+01 1.258e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-27 13:41:07,038 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.17 vs. limit=22.5 2023-11-27 13:41:16,864 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462600 2023-11-27 13:41:18,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3083953.3333333335, ans=0.1 2023-11-27 13:41:22,700 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5700, loss[loss=0.07126, simple_loss=0.09357, pruned_loss=0.01475, audio_tagging_loss=0.00973, over 15142.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.08921, pruned_loss=0.01267, audio_tagging_loss=0.009059, over 3040611.35 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:41:45,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3084153.3333333335, ans=0.0 2023-11-27 13:41:50,177 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.59 vs. limit=15.0 2023-11-27 13:42:09,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3084286.6666666665, ans=0.0 2023-11-27 13:42:10,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3084286.6666666665, ans=0.1 2023-11-27 13:42:14,971 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462650 2023-11-27 13:42:21,546 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5750, loss[loss=0.05429, simple_loss=0.07678, pruned_loss=0.008233, audio_tagging_loss=0.007669, over 15536.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08823, pruned_loss=0.01252, audio_tagging_loss=0.009005, over 3043912.13 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:42:24,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3084353.3333333335, ans=0.125 2023-11-27 13:43:01,847 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.577e+01 8.392e+01 9.189e+01 1.014e+02 1.266e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-27 13:43:11,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3084620.0, ans=0.125 2023-11-27 13:43:13,299 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462700 2023-11-27 13:43:18,809 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5800, loss[loss=0.08204, simple_loss=0.1112, pruned_loss=0.01582, audio_tagging_loss=0.0106, over 14310.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.0885, pruned_loss=0.01255, audio_tagging_loss=0.008907, over 3039488.71 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:43:19,555 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=22.5 2023-11-27 13:43:21,622 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=12.0 2023-11-27 13:43:26,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3084686.6666666665, ans=0.0 2023-11-27 13:43:35,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3084753.3333333335, ans=0.1 2023-11-27 13:43:58,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3084886.6666666665, ans=0.125 2023-11-27 13:44:06,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3084953.3333333335, ans=0.0 2023-11-27 13:44:07,845 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.27 vs. limit=22.5 2023-11-27 13:44:08,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3084953.3333333335, ans=0.125 2023-11-27 13:44:09,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3084953.3333333335, ans=0.0 2023-11-27 13:44:11,190 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462750 2023-11-27 13:44:11,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3084953.3333333335, ans=0.0 2023-11-27 13:44:16,470 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5850, loss[loss=0.05193, simple_loss=0.0662, pruned_loss=0.007908, audio_tagging_loss=0.01093, over 15687.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08932, pruned_loss=0.01272, audio_tagging_loss=0.008827, over 3040092.77 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:44:22,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3085020.0, ans=0.0 2023-11-27 13:44:25,321 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.54 vs. limit=22.5 2023-11-27 13:44:57,008 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 8.603e+01 9.114e+01 9.888e+01 1.396e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-27 13:44:57,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3085220.0, ans=0.1 2023-11-27 13:45:08,514 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462800 2023-11-27 13:45:14,806 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5900, loss[loss=0.1033, simple_loss=0.1571, pruned_loss=0.0204, audio_tagging_loss=0.004334, over 15918.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09025, pruned_loss=0.01293, audio_tagging_loss=0.008749, over 3044324.39 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:45:23,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3085353.3333333335, ans=0.125 2023-11-27 13:46:05,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3085620.0, ans=0.125 2023-11-27 13:46:07,466 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462850 2023-11-27 13:46:07,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3085620.0, ans=0.2 2023-11-27 13:46:12,871 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5950, loss[loss=0.07137, simple_loss=0.1003, pruned_loss=0.01149, audio_tagging_loss=0.0097, over 15069.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.08997, pruned_loss=0.01284, audio_tagging_loss=0.008753, over 3047518.21 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:46:18,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3085686.6666666665, ans=0.125 2023-11-27 13:46:18,912 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=12.0 2023-11-27 13:46:30,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.85 vs. limit=15.0 2023-11-27 13:46:51,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3085886.6666666665, ans=0.1 2023-11-27 13:46:53,852 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.241e+01 8.517e+01 9.163e+01 1.018e+02 1.224e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-27 13:46:58,587 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=12.0 2023-11-27 13:47:04,821 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462900 2023-11-27 13:47:10,186 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6000, loss[loss=0.0696, simple_loss=0.09313, pruned_loss=0.01344, audio_tagging_loss=0.0096, over 14216.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09043, pruned_loss=0.01284, audio_tagging_loss=0.008739, over 3043144.61 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:47:10,189 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 13:47:44,805 INFO [train_asr.py:1267] (0/4) Epoch 39, validation: loss=0.05766, simple_loss=0.05076, pruned_loss=0.005225, audio_tagging_loss=0.02706, over 4681554.00 frames. 2023-11-27 13:47:44,806 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 13:47:53,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3086020.0, ans=0.0 2023-11-27 13:48:05,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.00 vs. limit=15.0 2023-11-27 13:48:10,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3086153.3333333335, ans=0.125 2023-11-27 13:48:24,949 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=15.0 2023-11-27 13:48:26,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3086220.0, ans=0.125 2023-11-27 13:48:29,947 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:48:36,748 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462950 2023-11-27 13:48:40,340 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:48:42,319 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6050, loss[loss=0.05619, simple_loss=0.07368, pruned_loss=0.008907, audio_tagging_loss=0.01045, over 15649.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09032, pruned_loss=0.01276, audio_tagging_loss=0.008688, over 3046617.53 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:48:44,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3086353.3333333335, ans=0.0 2023-11-27 13:48:54,950 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.86 vs. limit=15.0 2023-11-27 13:49:18,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3086553.3333333335, ans=0.1 2023-11-27 13:49:24,234 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.673e+01 9.372e+01 1.019e+02 1.327e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 13:49:25,535 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:49:26,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3086553.3333333335, ans=0.2 2023-11-27 13:49:34,266 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463000 2023-11-27 13:49:35,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3086620.0, ans=0.2 2023-11-27 13:49:38,396 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.20 vs. limit=12.0 2023-11-27 13:49:39,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3086686.6666666665, ans=0.0 2023-11-27 13:49:40,139 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6100, loss[loss=0.06903, simple_loss=0.08149, pruned_loss=0.01708, audio_tagging_loss=0.01121, over 14275.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09146, pruned_loss=0.01291, audio_tagging_loss=0.008614, over 3045524.51 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:49:40,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3086686.6666666665, ans=0.0 2023-11-27 13:49:43,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3086686.6666666665, ans=0.125 2023-11-27 13:49:46,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3086686.6666666665, ans=0.0 2023-11-27 13:49:51,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=3086753.3333333335, ans=0.2 2023-11-27 13:50:22,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3086886.6666666665, ans=0.2 2023-11-27 13:50:32,456 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463050 2023-11-27 13:50:32,860 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.13 vs. limit=15.0 2023-11-27 13:50:38,832 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6150, loss[loss=0.05302, simple_loss=0.07107, pruned_loss=0.007162, audio_tagging_loss=0.01032, over 15153.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09047, pruned_loss=0.01268, audio_tagging_loss=0.008694, over 3049886.45 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:50:40,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3087020.0, ans=0.125 2023-11-27 13:50:49,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3087086.6666666665, ans=0.5 2023-11-27 13:50:58,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3087086.6666666665, ans=0.0 2023-11-27 13:51:07,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3087153.3333333335, ans=0.2 2023-11-27 13:51:11,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3087220.0, ans=0.1 2023-11-27 13:51:20,159 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.153e+01 8.580e+01 9.298e+01 1.013e+02 1.298e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-27 13:51:21,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3087220.0, ans=0.125 2023-11-27 13:51:29,702 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.38 vs. limit=6.0 2023-11-27 13:51:31,259 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463100 2023-11-27 13:51:32,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3087286.6666666665, ans=0.0 2023-11-27 13:51:36,706 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6200, loss[loss=0.05905, simple_loss=0.08395, pruned_loss=0.009924, audio_tagging_loss=0.007154, over 16484.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09015, pruned_loss=0.01269, audio_tagging_loss=0.008782, over 3051296.55 frames. ], batch size: 64, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:52:02,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3087486.6666666665, ans=0.1 2023-11-27 13:52:05,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3087486.6666666665, ans=0.0 2023-11-27 13:52:05,862 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.30 vs. limit=10.0 2023-11-27 13:52:20,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3087553.3333333335, ans=0.125 2023-11-27 13:52:28,829 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463150 2023-11-27 13:52:33,526 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=12.0 2023-11-27 13:52:34,177 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6250, loss[loss=0.06223, simple_loss=0.08044, pruned_loss=0.01117, audio_tagging_loss=0.01084, over 15829.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08975, pruned_loss=0.01267, audio_tagging_loss=0.008877, over 3050252.74 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:52:55,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3087753.3333333335, ans=0.125 2023-11-27 13:52:55,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3087753.3333333335, ans=0.0 2023-11-27 13:52:56,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3087753.3333333335, ans=0.125 2023-11-27 13:53:00,093 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=15.0 2023-11-27 13:53:09,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3087886.6666666665, ans=0.125 2023-11-27 13:53:14,597 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-11-27 13:53:16,193 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.844e+01 8.648e+01 9.152e+01 1.003e+02 1.287e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-27 13:53:19,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3087953.3333333335, ans=0.05 2023-11-27 13:53:20,019 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.64 vs. limit=15.0 2023-11-27 13:53:26,239 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463200 2023-11-27 13:53:32,902 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6300, loss[loss=0.07468, simple_loss=0.09456, pruned_loss=0.01721, audio_tagging_loss=0.01019, over 15143.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.08961, pruned_loss=0.01266, audio_tagging_loss=0.009041, over 3049048.40 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:53:33,053 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:53:38,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3088020.0, ans=0.125 2023-11-27 13:53:47,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3088086.6666666665, ans=0.125 2023-11-27 13:54:03,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3088153.3333333335, ans=0.125 2023-11-27 13:54:05,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3088153.3333333335, ans=0.1 2023-11-27 13:54:10,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3088220.0, ans=0.125 2023-11-27 13:54:15,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3088220.0, ans=0.125 2023-11-27 13:54:25,978 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463250 2023-11-27 13:54:29,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3088286.6666666665, ans=0.09899494936611666 2023-11-27 13:54:31,603 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6350, loss[loss=0.07744, simple_loss=0.1057, pruned_loss=0.01629, audio_tagging_loss=0.008292, over 15694.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09009, pruned_loss=0.01268, audio_tagging_loss=0.009081, over 3048771.30 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:54:39,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3088353.3333333335, ans=0.2 2023-11-27 13:54:43,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3088420.0, ans=0.125 2023-11-27 13:54:47,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.96 vs. limit=15.0 2023-11-27 13:54:59,840 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:55:04,366 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.74 vs. limit=22.5 2023-11-27 13:55:13,240 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.449e+01 8.528e+01 9.081e+01 9.909e+01 1.480e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-27 13:55:17,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3088620.0, ans=0.1 2023-11-27 13:55:18,305 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.62 vs. limit=15.0 2023-11-27 13:55:21,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3088620.0, ans=0.125 2023-11-27 13:55:23,366 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463300 2023-11-27 13:55:28,796 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6400, loss[loss=0.0607, simple_loss=0.08371, pruned_loss=0.009246, audio_tagging_loss=0.009598, over 14687.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.08988, pruned_loss=0.01266, audio_tagging_loss=0.009133, over 3042830.52 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:55:29,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3088686.6666666665, ans=0.0 2023-11-27 13:55:30,435 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2023-11-27 13:55:35,942 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.39 vs. limit=12.0 2023-11-27 13:55:53,165 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.08 vs. limit=22.5 2023-11-27 13:55:59,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3088820.0, ans=0.0 2023-11-27 13:56:13,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3088953.3333333335, ans=0.0 2023-11-27 13:56:20,297 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463350 2023-11-27 13:56:25,832 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6450, loss[loss=0.07069, simple_loss=0.09023, pruned_loss=0.01537, audio_tagging_loss=0.01021, over 15968.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09051, pruned_loss=0.01271, audio_tagging_loss=0.00922, over 3040226.93 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:56:30,392 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.61 vs. limit=15.0 2023-11-27 13:56:43,909 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.90 vs. limit=12.0 2023-11-27 13:56:57,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.47 vs. limit=22.5 2023-11-27 13:57:07,845 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.300e+01 8.614e+01 9.257e+01 9.847e+01 1.533e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-27 13:57:19,463 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463400 2023-11-27 13:57:23,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3089286.6666666665, ans=0.0 2023-11-27 13:57:25,802 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6500, loss[loss=0.07725, simple_loss=0.1035, pruned_loss=0.01763, audio_tagging_loss=0.007888, over 15123.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.08999, pruned_loss=0.0126, audio_tagging_loss=0.009192, over 3045521.26 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:57:40,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.26 vs. limit=15.0 2023-11-27 13:57:59,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3089553.3333333335, ans=0.95 2023-11-27 13:58:01,020 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2023-11-27 13:58:04,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3089553.3333333335, ans=0.0 2023-11-27 13:58:05,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3089553.3333333335, ans=0.125 2023-11-27 13:58:05,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3089553.3333333335, ans=0.125 2023-11-27 13:58:17,205 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463450 2023-11-27 13:58:19,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3089620.0, ans=0.0 2023-11-27 13:58:22,710 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6550, loss[loss=0.07965, simple_loss=0.1035, pruned_loss=0.01778, audio_tagging_loss=0.01012, over 14946.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09073, pruned_loss=0.01265, audio_tagging_loss=0.009066, over 3050751.65 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:58:23,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3089686.6666666665, ans=0.0 2023-11-27 13:58:25,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.27 vs. limit=22.5 2023-11-27 13:58:57,477 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=12.0 2023-11-27 13:59:04,350 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.565e+01 9.161e+01 9.963e+01 1.311e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 13:59:10,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3089953.3333333335, ans=0.035 2023-11-27 13:59:12,665 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.11 vs. limit=15.0 2023-11-27 13:59:14,310 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463500 2023-11-27 13:59:19,699 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6600, loss[loss=0.04448, simple_loss=0.06035, pruned_loss=0.006571, audio_tagging_loss=0.00773, over 15275.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09063, pruned_loss=0.01283, audio_tagging_loss=0.008953, over 3044813.25 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:59:21,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3090020.0, ans=0.015 2023-11-27 13:59:30,302 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2023-11-27 13:59:47,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3090153.3333333335, ans=0.125 2023-11-27 13:59:54,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3090220.0, ans=0.0 2023-11-27 13:59:55,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3090220.0, ans=0.0 2023-11-27 14:00:02,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3090220.0, ans=0.125 2023-11-27 14:00:11,635 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463550 2023-11-27 14:00:17,671 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6650, loss[loss=0.07751, simple_loss=0.104, pruned_loss=0.01861, audio_tagging_loss=0.006898, over 15607.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.08958, pruned_loss=0.01278, audio_tagging_loss=0.008938, over 3041541.88 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:00:28,591 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=15.0 2023-11-27 14:00:43,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3090486.6666666665, ans=0.125 2023-11-27 14:00:52,203 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2023-11-27 14:00:58,785 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.791e+01 9.442e+01 1.009e+02 1.378e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-27 14:01:09,417 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463600 2023-11-27 14:01:15,149 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6700, loss[loss=0.0653, simple_loss=0.08057, pruned_loss=0.01113, audio_tagging_loss=0.01388, over 15791.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.08971, pruned_loss=0.01273, audio_tagging_loss=0.008889, over 3042420.22 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:01:16,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3090686.6666666665, ans=0.1 2023-11-27 14:01:27,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3090753.3333333335, ans=0.125 2023-11-27 14:01:28,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3090753.3333333335, ans=0.1 2023-11-27 14:01:38,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=3090820.0, ans=0.1 2023-11-27 14:01:50,076 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=12.0 2023-11-27 14:02:06,791 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463650 2023-11-27 14:02:09,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3090953.3333333335, ans=0.125 2023-11-27 14:02:12,106 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6750, loss[loss=0.06779, simple_loss=0.08979, pruned_loss=0.01377, audio_tagging_loss=0.009128, over 14847.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09029, pruned_loss=0.01271, audio_tagging_loss=0.008741, over 3045097.67 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:02:34,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3091153.3333333335, ans=0.0 2023-11-27 14:02:35,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3091153.3333333335, ans=0.1 2023-11-27 14:02:44,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3091153.3333333335, ans=0.125 2023-11-27 14:02:44,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3091153.3333333335, ans=0.125 2023-11-27 14:02:45,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3091153.3333333335, ans=0.0 2023-11-27 14:02:53,465 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.955e+01 8.433e+01 9.032e+01 9.783e+01 1.125e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-27 14:03:03,932 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463700 2023-11-27 14:03:08,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3091286.6666666665, ans=0.0 2023-11-27 14:03:10,156 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6800, loss[loss=0.07696, simple_loss=0.1083, pruned_loss=0.01317, audio_tagging_loss=0.009635, over 15194.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09081, pruned_loss=0.01281, audio_tagging_loss=0.008701, over 3043378.59 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:03:16,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3091353.3333333335, ans=0.125 2023-11-27 14:03:35,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3091486.6666666665, ans=0.125 2023-11-27 14:03:44,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=3091553.3333333335, ans=12.0 2023-11-27 14:04:00,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3091620.0, ans=0.0 2023-11-27 14:04:01,343 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463750 2023-11-27 14:04:06,896 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6850, loss[loss=0.07022, simple_loss=0.09019, pruned_loss=0.01499, audio_tagging_loss=0.01013, over 15326.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09115, pruned_loss=0.01287, audio_tagging_loss=0.008677, over 3038950.18 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:04:10,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3091686.6666666665, ans=0.0 2023-11-27 14:04:14,812 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.69 vs. limit=15.0 2023-11-27 14:04:22,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3091753.3333333335, ans=0.0 2023-11-27 14:04:24,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3091753.3333333335, ans=0.125 2023-11-27 14:04:49,937 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 8.738e+01 9.106e+01 9.965e+01 1.501e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-27 14:04:56,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3091953.3333333335, ans=0.0 2023-11-27 14:04:59,345 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463800 2023-11-27 14:05:05,186 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6900, loss[loss=0.0642, simple_loss=0.08829, pruned_loss=0.01363, audio_tagging_loss=0.006422, over 16825.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09089, pruned_loss=0.01265, audio_tagging_loss=0.008735, over 3039093.70 frames. ], batch size: 64, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:05:07,826 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.77 vs. limit=15.0 2023-11-27 14:05:10,152 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2023-11-27 14:05:22,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3092086.6666666665, ans=0.125 2023-11-27 14:05:33,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3092153.3333333335, ans=0.0 2023-11-27 14:05:45,335 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2023-11-27 14:05:47,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3092220.0, ans=0.125 2023-11-27 14:05:53,820 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 14:05:57,211 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463850 2023-11-27 14:06:03,939 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6950, loss[loss=0.09053, simple_loss=0.1209, pruned_loss=0.0215, audio_tagging_loss=0.008559, over 16033.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09098, pruned_loss=0.01272, audio_tagging_loss=0.008705, over 3040778.36 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:06:11,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=12.0 2023-11-27 14:06:16,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3092420.0, ans=0.2 2023-11-27 14:06:25,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3092420.0, ans=0.2 2023-11-27 14:06:37,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3092553.3333333335, ans=0.1 2023-11-27 14:06:46,185 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.400e+01 9.204e+01 9.755e+01 1.289e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-27 14:06:54,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3092620.0, ans=0.1 2023-11-27 14:06:55,688 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463900 2023-11-27 14:06:57,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3092620.0, ans=0.5 2023-11-27 14:07:01,084 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7000, loss[loss=0.07038, simple_loss=0.09418, pruned_loss=0.01618, audio_tagging_loss=0.00711, over 14611.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09076, pruned_loss=0.01259, audio_tagging_loss=0.008784, over 3034896.07 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:07:12,235 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=15.0 2023-11-27 14:07:45,305 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:07:52,714 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463950 2023-11-27 14:07:58,834 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7050, loss[loss=0.06527, simple_loss=0.08316, pruned_loss=0.01203, audio_tagging_loss=0.01166, over 14548.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09046, pruned_loss=0.01255, audio_tagging_loss=0.0088, over 3036487.12 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:08:06,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3093020.0, ans=0.125 2023-11-27 14:08:10,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2023-11-27 14:08:23,519 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2023-11-27 14:08:24,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.06 vs. limit=15.0 2023-11-27 14:08:24,712 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2023-11-27 14:08:33,411 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.62 vs. limit=10.0 2023-11-27 14:08:41,271 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.528e+01 9.043e+01 9.552e+01 1.279e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-27 14:08:50,132 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464000 2023-11-27 14:08:51,541 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-464000.pt 2023-11-27 14:08:58,745 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7100, loss[loss=0.08835, simple_loss=0.1241, pruned_loss=0.01843, audio_tagging_loss=0.007862, over 16313.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08984, pruned_loss=0.01239, audio_tagging_loss=0.008897, over 3038374.52 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:09:11,510 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:09:14,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3093420.0, ans=0.1 2023-11-27 14:09:29,101 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.96 vs. limit=22.5 2023-11-27 14:09:50,873 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464050 2023-11-27 14:09:56,368 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7150, loss[loss=0.06949, simple_loss=0.09606, pruned_loss=0.01323, audio_tagging_loss=0.008239, over 14934.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08962, pruned_loss=0.01239, audio_tagging_loss=0.008879, over 3037175.62 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:10:37,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3093886.6666666665, ans=0.125 2023-11-27 14:10:39,822 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.709e+01 9.080e+01 1.002e+02 1.169e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-27 14:10:46,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3093953.3333333335, ans=0.125 2023-11-27 14:10:47,558 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464100 2023-11-27 14:10:47,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3093953.3333333335, ans=0.2 2023-11-27 14:10:53,056 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7200, loss[loss=0.06862, simple_loss=0.09239, pruned_loss=0.01349, audio_tagging_loss=0.008932, over 14877.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09065, pruned_loss=0.01261, audio_tagging_loss=0.008988, over 3044263.97 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:11:01,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=3094020.0, ans=0.1 2023-11-27 14:11:26,373 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.74 vs. limit=22.5 2023-11-27 14:11:30,852 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.03 vs. limit=22.5 2023-11-27 14:11:45,205 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464150 2023-11-27 14:11:45,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3094286.6666666665, ans=0.2 2023-11-27 14:11:50,688 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7250, loss[loss=0.07106, simple_loss=0.09623, pruned_loss=0.01431, audio_tagging_loss=0.008641, over 15126.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.09132, pruned_loss=0.01277, audio_tagging_loss=0.009008, over 3042492.86 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:11:57,931 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.66 vs. limit=15.0 2023-11-27 14:12:02,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3094420.0, ans=0.0 2023-11-27 14:12:05,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3094420.0, ans=0.125 2023-11-27 14:12:16,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3094486.6666666665, ans=0.0 2023-11-27 14:12:24,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3094553.3333333335, ans=0.05 2023-11-27 14:12:31,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3094553.3333333335, ans=0.2 2023-11-27 14:12:34,348 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.079e+01 8.560e+01 9.107e+01 9.786e+01 1.290e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-27 14:12:37,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3094620.0, ans=0.125 2023-11-27 14:12:43,287 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464200 2023-11-27 14:12:49,029 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7300, loss[loss=0.04463, simple_loss=0.05419, pruned_loss=0.008958, audio_tagging_loss=0.008581, over 13977.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09039, pruned_loss=0.0127, audio_tagging_loss=0.008961, over 3042436.85 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:13:00,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3094753.3333333335, ans=0.0 2023-11-27 14:13:15,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3094820.0, ans=0.0 2023-11-27 14:13:40,352 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464250 2023-11-27 14:13:42,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3094953.3333333335, ans=0.0 2023-11-27 14:13:45,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3095020.0, ans=0.09899494936611666 2023-11-27 14:13:45,845 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7350, loss[loss=0.05813, simple_loss=0.07774, pruned_loss=0.009629, audio_tagging_loss=0.009627, over 14764.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09032, pruned_loss=0.01272, audio_tagging_loss=0.008774, over 3047380.91 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:13:52,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3095020.0, ans=0.0 2023-11-27 14:13:53,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3095020.0, ans=0.125 2023-11-27 14:14:00,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3095086.6666666665, ans=0.125 2023-11-27 14:14:03,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3095086.6666666665, ans=0.125 2023-11-27 14:14:05,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3095086.6666666665, ans=0.125 2023-11-27 14:14:27,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3095220.0, ans=0.125 2023-11-27 14:14:29,931 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.691e+01 9.417e+01 9.998e+01 1.354e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-27 14:14:37,802 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464300 2023-11-27 14:14:43,434 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=12.0 2023-11-27 14:14:43,848 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7400, loss[loss=0.07026, simple_loss=0.08983, pruned_loss=0.0159, audio_tagging_loss=0.009439, over 14060.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08987, pruned_loss=0.01268, audio_tagging_loss=0.00873, over 3044388.56 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:14:56,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3095420.0, ans=0.125 2023-11-27 14:15:02,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3095420.0, ans=0.07 2023-11-27 14:15:12,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3095486.6666666665, ans=0.125 2023-11-27 14:15:22,218 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.29 vs. limit=15.0 2023-11-27 14:15:29,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3095620.0, ans=0.2 2023-11-27 14:15:33,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3095620.0, ans=0.1 2023-11-27 14:15:36,781 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464350 2023-11-27 14:15:37,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3095620.0, ans=0.125 2023-11-27 14:15:42,173 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7450, loss[loss=0.06231, simple_loss=0.08987, pruned_loss=0.01172, audio_tagging_loss=0.005658, over 15418.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09037, pruned_loss=0.0128, audio_tagging_loss=0.008654, over 3049274.17 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:15:57,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3095753.3333333335, ans=0.125 2023-11-27 14:16:13,452 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.54 vs. limit=15.0 2023-11-27 14:16:20,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3095886.6666666665, ans=0.125 2023-11-27 14:16:22,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3095886.6666666665, ans=0.125 2023-11-27 14:16:25,874 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 8.639e+01 9.279e+01 9.819e+01 1.205e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-27 14:16:32,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3095953.3333333335, ans=0.09899494936611666 2023-11-27 14:16:33,607 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464400 2023-11-27 14:16:35,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3095953.3333333335, ans=0.0 2023-11-27 14:16:39,348 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7500, loss[loss=0.06911, simple_loss=0.09103, pruned_loss=0.01293, audio_tagging_loss=0.01067, over 16714.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09104, pruned_loss=0.01285, audio_tagging_loss=0.008607, over 3057256.62 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:16:45,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3096020.0, ans=0.1 2023-11-27 14:16:47,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3096020.0, ans=0.125 2023-11-27 14:16:49,970 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.37 vs. limit=15.0 2023-11-27 14:16:53,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3096086.6666666665, ans=0.0 2023-11-27 14:17:31,938 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464450 2023-11-27 14:17:37,457 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7550, loss[loss=0.07079, simple_loss=0.1038, pruned_loss=0.01389, audio_tagging_loss=0.005004, over 15384.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09095, pruned_loss=0.01269, audio_tagging_loss=0.008632, over 3056302.71 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:17:38,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3096353.3333333335, ans=0.125 2023-11-27 14:17:39,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3096353.3333333335, ans=0.0 2023-11-27 14:17:58,310 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.65 vs. limit=15.0 2023-11-27 14:18:11,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3096486.6666666665, ans=0.125 2023-11-27 14:18:15,775 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.56 vs. limit=15.0 2023-11-27 14:18:17,961 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2023-11-27 14:18:22,773 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 8.787e+01 9.439e+01 1.010e+02 1.313e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-27 14:18:24,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3096620.0, ans=0.125 2023-11-27 14:18:28,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3096620.0, ans=0.0 2023-11-27 14:18:29,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3096620.0, ans=0.1 2023-11-27 14:18:31,265 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464500 2023-11-27 14:18:37,291 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7600, loss[loss=0.04032, simple_loss=0.04821, pruned_loss=0.006357, audio_tagging_loss=0.009855, over 14323.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09086, pruned_loss=0.01276, audio_tagging_loss=0.008666, over 3053731.82 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:18:50,658 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:18:58,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3096820.0, ans=0.025 2023-11-27 14:19:12,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3096886.6666666665, ans=0.125 2023-11-27 14:19:28,826 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464550 2023-11-27 14:19:34,218 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7650, loss[loss=0.06404, simple_loss=0.08588, pruned_loss=0.01397, audio_tagging_loss=0.007127, over 16256.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09083, pruned_loss=0.01279, audio_tagging_loss=0.008679, over 3055527.74 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:19:40,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3097020.0, ans=0.1 2023-11-27 14:19:41,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3097020.0, ans=0.0 2023-11-27 14:19:42,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3097020.0, ans=0.1 2023-11-27 14:19:44,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3097086.6666666665, ans=0.125 2023-11-27 14:19:53,704 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.26 vs. limit=12.0 2023-11-27 14:19:55,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3097086.6666666665, ans=0.125 2023-11-27 14:20:06,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3097153.3333333335, ans=0.1 2023-11-27 14:20:15,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3097220.0, ans=0.125 2023-11-27 14:20:18,855 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.470e+01 8.990e+01 9.726e+01 1.372e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-27 14:20:21,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3097286.6666666665, ans=0.125 2023-11-27 14:20:25,465 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464600 2023-11-27 14:20:29,773 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2023-11-27 14:20:31,205 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7700, loss[loss=0.07683, simple_loss=0.1037, pruned_loss=0.01449, audio_tagging_loss=0.01049, over 15422.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09099, pruned_loss=0.0128, audio_tagging_loss=0.008661, over 3052636.57 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:20:31,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3097353.3333333335, ans=0.0 2023-11-27 14:20:53,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3097420.0, ans=0.025 2023-11-27 14:20:53,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3097420.0, ans=0.2 2023-11-27 14:20:56,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3097486.6666666665, ans=0.0 2023-11-27 14:21:10,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3097553.3333333335, ans=0.125 2023-11-27 14:21:17,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3097620.0, ans=0.2 2023-11-27 14:21:23,390 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464650 2023-11-27 14:21:30,602 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7750, loss[loss=0.07791, simple_loss=0.1131, pruned_loss=0.01442, audio_tagging_loss=0.006934, over 15285.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09088, pruned_loss=0.0128, audio_tagging_loss=0.008732, over 3045859.05 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:21:55,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3097820.0, ans=0.125 2023-11-27 14:21:56,967 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:22:01,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3097820.0, ans=0.2 2023-11-27 14:22:13,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.75 vs. limit=6.0 2023-11-27 14:22:15,498 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.225e+01 8.645e+01 9.369e+01 1.003e+02 1.399e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 14:22:16,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.94 vs. limit=10.0 2023-11-27 14:22:21,237 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:22:22,181 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464700 2023-11-27 14:22:24,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3097953.3333333335, ans=0.125 2023-11-27 14:22:27,494 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7800, loss[loss=0.07263, simple_loss=0.09691, pruned_loss=0.01604, audio_tagging_loss=0.008129, over 16431.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09053, pruned_loss=0.01274, audio_tagging_loss=0.008781, over 3047164.59 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:22:27,988 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-27 14:22:33,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3098020.0, ans=0.0 2023-11-27 14:22:39,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3098086.6666666665, ans=0.125 2023-11-27 14:22:47,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3098086.6666666665, ans=0.0 2023-11-27 14:22:49,218 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2023-11-27 14:23:07,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3098220.0, ans=0.125 2023-11-27 14:23:08,899 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2023-11-27 14:23:11,160 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2023-11-27 14:23:13,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3098286.6666666665, ans=0.0 2023-11-27 14:23:17,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3098286.6666666665, ans=0.1 2023-11-27 14:23:19,333 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464750 2023-11-27 14:23:24,833 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7850, loss[loss=0.05991, simple_loss=0.08351, pruned_loss=0.01165, audio_tagging_loss=0.006498, over 15073.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09, pruned_loss=0.01263, audio_tagging_loss=0.008927, over 3049161.58 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:23:28,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3098353.3333333335, ans=0.1 2023-11-27 14:23:35,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3098420.0, ans=0.0 2023-11-27 14:23:57,425 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.86 vs. limit=22.5 2023-11-27 14:24:10,076 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.600e+01 9.119e+01 9.772e+01 1.362e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-27 14:24:17,275 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464800 2023-11-27 14:24:24,278 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7900, loss[loss=0.07019, simple_loss=0.08897, pruned_loss=0.01638, audio_tagging_loss=0.009327, over 14759.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.08997, pruned_loss=0.01258, audio_tagging_loss=0.009014, over 3052403.45 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:24:24,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3098686.6666666665, ans=0.0 2023-11-27 14:24:27,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3098686.6666666665, ans=0.0 2023-11-27 14:24:31,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3098686.6666666665, ans=0.04949747468305833 2023-11-27 14:24:38,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3098753.3333333335, ans=0.0 2023-11-27 14:24:43,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3098753.3333333335, ans=0.2 2023-11-27 14:24:49,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3098820.0, ans=0.125 2023-11-27 14:25:15,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3098953.3333333335, ans=0.0 2023-11-27 14:25:16,368 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464850 2023-11-27 14:25:22,383 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7950, loss[loss=0.05728, simple_loss=0.06619, pruned_loss=0.01172, audio_tagging_loss=0.01247, over 14585.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08903, pruned_loss=0.01248, audio_tagging_loss=0.009152, over 3046457.71 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:25:23,617 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:25:34,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3099086.6666666665, ans=0.125 2023-11-27 14:25:38,884 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 14:25:59,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3099220.0, ans=0.2 2023-11-27 14:26:01,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3099220.0, ans=0.0 2023-11-27 14:26:07,811 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.621e+01 8.980e+01 9.722e+01 1.502e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-27 14:26:14,619 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464900 2023-11-27 14:26:20,179 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8000, loss[loss=0.04015, simple_loss=0.04339, pruned_loss=0.006875, audio_tagging_loss=0.01158, over 14734.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.0884, pruned_loss=0.01239, audio_tagging_loss=0.009301, over 3037632.32 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:26:25,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3099353.3333333335, ans=0.1 2023-11-27 14:26:35,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3099420.0, ans=0.125 2023-11-27 14:26:55,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3099553.3333333335, ans=0.125 2023-11-27 14:27:12,130 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464950 2023-11-27 14:27:12,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3099620.0, ans=0.125 2023-11-27 14:27:18,055 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8050, loss[loss=0.06996, simple_loss=0.0931, pruned_loss=0.01485, audio_tagging_loss=0.008564, over 15201.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08864, pruned_loss=0.01248, audio_tagging_loss=0.009326, over 3043616.91 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:27:18,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3099686.6666666665, ans=0.125 2023-11-27 14:27:27,649 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.16 vs. limit=15.0 2023-11-27 14:28:00,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3099886.6666666665, ans=0.0 2023-11-27 14:28:04,071 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.865e+01 8.542e+01 9.133e+01 9.654e+01 1.190e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-27 14:28:07,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3099953.3333333335, ans=10.0 2023-11-27 14:28:11,413 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465000 2023-11-27 14:28:13,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3099953.3333333335, ans=0.125 2023-11-27 14:28:15,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.47 vs. limit=12.0 2023-11-27 14:28:17,221 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8100, loss[loss=0.08425, simple_loss=0.1167, pruned_loss=0.01935, audio_tagging_loss=0.006562, over 15122.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.0893, pruned_loss=0.01249, audio_tagging_loss=0.009117, over 3039363.24 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:28:23,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3100020.0, ans=0.125 2023-11-27 14:29:01,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3100220.0, ans=0.2 2023-11-27 14:29:09,998 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465050 2023-11-27 14:29:15,407 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8150, loss[loss=0.07238, simple_loss=0.1067, pruned_loss=0.01186, audio_tagging_loss=0.007149, over 16315.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09098, pruned_loss=0.01262, audio_tagging_loss=0.008846, over 3050680.83 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:29:19,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3100353.3333333335, ans=0.0 2023-11-27 14:29:31,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3100420.0, ans=0.0 2023-11-27 14:29:53,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3100553.3333333335, ans=0.125 2023-11-27 14:29:59,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3100553.3333333335, ans=0.125 2023-11-27 14:30:01,438 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 8.694e+01 9.359e+01 9.958e+01 1.190e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-27 14:30:01,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3100620.0, ans=0.2 2023-11-27 14:30:07,011 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465100 2023-11-27 14:30:08,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3100620.0, ans=0.2 2023-11-27 14:30:12,884 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8200, loss[loss=0.06905, simple_loss=0.1018, pruned_loss=0.01098, audio_tagging_loss=0.007151, over 14918.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09113, pruned_loss=0.0128, audio_tagging_loss=0.008793, over 3047118.30 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:30:17,821 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 14:30:21,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3100686.6666666665, ans=0.125 2023-11-27 14:30:32,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3100753.3333333335, ans=0.125 2023-11-27 14:30:36,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3100820.0, ans=0.1 2023-11-27 14:30:40,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3100820.0, ans=0.0 2023-11-27 14:30:45,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3100820.0, ans=0.0 2023-11-27 14:30:52,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3100886.6666666665, ans=0.125 2023-11-27 14:31:05,969 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465150 2023-11-27 14:31:10,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3101020.0, ans=0.125 2023-11-27 14:31:11,292 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8250, loss[loss=0.07395, simple_loss=0.09947, pruned_loss=0.01523, audio_tagging_loss=0.00899, over 15304.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.0916, pruned_loss=0.01271, audio_tagging_loss=0.008703, over 3050856.46 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:31:29,197 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.73 vs. limit=22.5 2023-11-27 14:31:57,583 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.825e+01 8.477e+01 9.033e+01 1.006e+02 1.389e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-27 14:32:03,125 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465200 2023-11-27 14:32:06,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3101286.6666666665, ans=0.05 2023-11-27 14:32:09,409 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8300, loss[loss=0.06393, simple_loss=0.07324, pruned_loss=0.01501, audio_tagging_loss=0.01229, over 15243.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.0915, pruned_loss=0.01273, audio_tagging_loss=0.008721, over 3049623.33 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:33:01,392 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465250 2023-11-27 14:33:01,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3101620.0, ans=0.125 2023-11-27 14:33:06,777 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8350, loss[loss=0.07255, simple_loss=0.1038, pruned_loss=0.01351, audio_tagging_loss=0.00714, over 15199.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09189, pruned_loss=0.01289, audio_tagging_loss=0.008617, over 3046963.75 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:33:18,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3101753.3333333335, ans=0.125 2023-11-27 14:33:48,086 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:33:53,576 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.236e+01 8.419e+01 8.984e+01 9.870e+01 1.325e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-27 14:33:58,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3101953.3333333335, ans=0.125 2023-11-27 14:33:59,684 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465300 2023-11-27 14:34:05,816 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8400, loss[loss=0.06188, simple_loss=0.08299, pruned_loss=0.01272, audio_tagging_loss=0.00767, over 15165.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09065, pruned_loss=0.01273, audio_tagging_loss=0.008712, over 3050732.64 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:34:22,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3102086.6666666665, ans=0.125 2023-11-27 14:34:26,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3102086.6666666665, ans=0.2 2023-11-27 14:34:41,995 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-27 14:34:57,687 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465350 2023-11-27 14:35:03,084 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8450, loss[loss=0.0633, simple_loss=0.08029, pruned_loss=0.01351, audio_tagging_loss=0.009642, over 14364.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09093, pruned_loss=0.01271, audio_tagging_loss=0.008705, over 3051292.94 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:35:22,710 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2023-11-27 14:35:25,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3102486.6666666665, ans=0.0 2023-11-27 14:35:49,754 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.831e+01 8.838e+01 9.408e+01 1.009e+02 1.207e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 14:35:55,434 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465400 2023-11-27 14:36:00,259 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.28 vs. limit=22.5 2023-11-27 14:36:01,559 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.76 vs. limit=10.0 2023-11-27 14:36:02,003 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8500, loss[loss=0.0581, simple_loss=0.08209, pruned_loss=0.008132, audio_tagging_loss=0.008924, over 15264.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09121, pruned_loss=0.01289, audio_tagging_loss=0.008855, over 3055888.78 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:36:02,221 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:36:12,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3102686.6666666665, ans=0.0 2023-11-27 14:36:53,825 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465450 2023-11-27 14:37:00,393 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8550, loss[loss=0.0468, simple_loss=0.06404, pruned_loss=0.005031, audio_tagging_loss=0.009747, over 15951.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09108, pruned_loss=0.01292, audio_tagging_loss=0.008851, over 3053782.25 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:37:10,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3103086.6666666665, ans=0.125 2023-11-27 14:37:25,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3103153.3333333335, ans=0.125 2023-11-27 14:37:39,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3103220.0, ans=0.5 2023-11-27 14:37:47,516 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.420e+01 8.683e+01 9.146e+01 9.913e+01 1.274e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-27 14:37:52,119 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465500 2023-11-27 14:37:57,540 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8600, loss[loss=0.05534, simple_loss=0.0787, pruned_loss=0.008439, audio_tagging_loss=0.00755, over 15219.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09051, pruned_loss=0.01276, audio_tagging_loss=0.008934, over 3045706.04 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:38:11,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3103420.0, ans=0.1 2023-11-27 14:38:17,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3103420.0, ans=0.125 2023-11-27 14:38:21,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3103486.6666666665, ans=0.09899494936611666 2023-11-27 14:38:22,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3103486.6666666665, ans=0.0 2023-11-27 14:38:24,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3103486.6666666665, ans=0.2 2023-11-27 14:38:28,142 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2023-11-27 14:38:43,680 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.62 vs. limit=10.0 2023-11-27 14:38:49,646 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465550 2023-11-27 14:38:50,125 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.39 vs. limit=10.0 2023-11-27 14:38:55,049 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8650, loss[loss=0.04714, simple_loss=0.06566, pruned_loss=0.007219, audio_tagging_loss=0.007089, over 14658.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09102, pruned_loss=0.01275, audio_tagging_loss=0.009005, over 3043612.97 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:39:15,375 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:39:24,443 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=22.5 2023-11-27 14:39:29,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3103886.6666666665, ans=0.1 2023-11-27 14:39:39,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3103886.6666666665, ans=0.0 2023-11-27 14:39:42,548 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.459e+01 9.176e+01 1.006e+02 1.194e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-27 14:39:45,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3103953.3333333335, ans=0.09899494936611666 2023-11-27 14:39:48,286 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465600 2023-11-27 14:39:53,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3104020.0, ans=0.125 2023-11-27 14:39:55,214 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8700, loss[loss=0.07703, simple_loss=0.09581, pruned_loss=0.01658, audio_tagging_loss=0.01255, over 15774.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09106, pruned_loss=0.01267, audio_tagging_loss=0.008973, over 3045908.78 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:40:03,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3104020.0, ans=0.125 2023-11-27 14:40:09,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3104086.6666666665, ans=0.125 2023-11-27 14:40:11,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3104086.6666666665, ans=15.0 2023-11-27 14:40:22,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3104153.3333333335, ans=0.125 2023-11-27 14:40:45,722 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:40:46,558 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465650 2023-11-27 14:40:48,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3104286.6666666665, ans=0.2 2023-11-27 14:40:51,976 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8750, loss[loss=0.07634, simple_loss=0.1067, pruned_loss=0.01667, audio_tagging_loss=0.006323, over 15990.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09154, pruned_loss=0.0128, audio_tagging_loss=0.009053, over 3047612.54 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:40:54,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=3104353.3333333335, ans=22.5 2023-11-27 14:41:00,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3104353.3333333335, ans=0.1 2023-11-27 14:41:03,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3104420.0, ans=0.025 2023-11-27 14:41:04,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3104420.0, ans=0.0 2023-11-27 14:41:28,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3104553.3333333335, ans=0.0 2023-11-27 14:41:36,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3104553.3333333335, ans=0.125 2023-11-27 14:41:39,399 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.863e+01 8.832e+01 9.228e+01 1.004e+02 1.241e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-27 14:41:40,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3104620.0, ans=0.1 2023-11-27 14:41:43,943 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465700 2023-11-27 14:41:45,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3104620.0, ans=0.125 2023-11-27 14:41:49,417 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8800, loss[loss=0.06596, simple_loss=0.08485, pruned_loss=0.01248, audio_tagging_loss=0.01105, over 15664.00 frames. ], tot_loss[loss=0.06885, simple_loss=0.09314, pruned_loss=0.01323, audio_tagging_loss=0.009047, over 3051083.89 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:42:02,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3104753.3333333335, ans=0.1 2023-11-27 14:42:22,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3104820.0, ans=10.0 2023-11-27 14:42:23,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3104886.6666666665, ans=0.0 2023-11-27 14:42:26,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.70 vs. limit=15.0 2023-11-27 14:42:41,325 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465750 2023-11-27 14:42:47,784 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8850, loss[loss=0.06195, simple_loss=0.08201, pruned_loss=0.009742, audio_tagging_loss=0.0112, over 15488.00 frames. ], tot_loss[loss=0.06839, simple_loss=0.09261, pruned_loss=0.01306, audio_tagging_loss=0.009029, over 3055559.65 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:42:53,831 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.76 vs. limit=6.0 2023-11-27 14:43:03,124 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 14:43:06,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3105086.6666666665, ans=0.0 2023-11-27 14:43:16,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3105153.3333333335, ans=0.125 2023-11-27 14:43:20,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3105220.0, ans=0.0 2023-11-27 14:43:26,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3105220.0, ans=0.125 2023-11-27 14:43:33,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3105286.6666666665, ans=0.0 2023-11-27 14:43:35,669 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.774e+01 9.216e+01 1.007e+02 1.244e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-27 14:43:38,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2023-11-27 14:43:40,112 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465800 2023-11-27 14:43:40,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3105286.6666666665, ans=0.125 2023-11-27 14:43:43,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3105286.6666666665, ans=0.1 2023-11-27 14:43:44,270 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2023-11-27 14:43:45,771 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8900, loss[loss=0.06826, simple_loss=0.0947, pruned_loss=0.0116, audio_tagging_loss=0.0093, over 14102.00 frames. ], tot_loss[loss=0.06797, simple_loss=0.09227, pruned_loss=0.01295, audio_tagging_loss=0.008884, over 3048716.20 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:43:48,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3105353.3333333335, ans=0.05 2023-11-27 14:44:06,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3105486.6666666665, ans=0.0 2023-11-27 14:44:07,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.20 vs. limit=12.0 2023-11-27 14:44:18,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3105486.6666666665, ans=0.2 2023-11-27 14:44:36,797 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465850 2023-11-27 14:44:42,240 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8950, loss[loss=0.04576, simple_loss=0.06375, pruned_loss=0.006211, audio_tagging_loss=0.007673, over 15778.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.09149, pruned_loss=0.01285, audio_tagging_loss=0.008833, over 3052241.75 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:44:50,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3105686.6666666665, ans=0.125 2023-11-27 14:44:51,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3105686.6666666665, ans=0.0 2023-11-27 14:45:14,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3105820.0, ans=0.1 2023-11-27 14:45:19,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3105886.6666666665, ans=0.2 2023-11-27 14:45:29,489 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.308e+01 8.633e+01 9.411e+01 1.036e+02 1.341e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 14:45:34,028 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465900 2023-11-27 14:45:39,984 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9000, loss[loss=0.05409, simple_loss=0.07015, pruned_loss=0.01005, audio_tagging_loss=0.008971, over 15074.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09165, pruned_loss=0.01295, audio_tagging_loss=0.008676, over 3053627.39 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:45:39,986 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 14:46:15,058 INFO [train_asr.py:1267] (0/4) Epoch 39, validation: loss=0.05878, simple_loss=0.0507, pruned_loss=0.005237, audio_tagging_loss=0.02819, over 4681554.00 frames. 2023-11-27 14:46:15,059 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 14:46:24,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3106020.0, ans=0.125 2023-11-27 14:46:57,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3106220.0, ans=10.0 2023-11-27 14:47:06,655 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465950 2023-11-27 14:47:07,317 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2023-11-27 14:47:11,963 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9050, loss[loss=0.06241, simple_loss=0.08163, pruned_loss=0.01307, audio_tagging_loss=0.008529, over 13989.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09132, pruned_loss=0.01299, audio_tagging_loss=0.008628, over 3048758.96 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:47:47,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3106553.3333333335, ans=0.0 2023-11-27 14:47:53,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3106553.3333333335, ans=0.035 2023-11-27 14:47:59,631 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.800e+01 9.356e+01 9.893e+01 1.212e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 14:47:59,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3106620.0, ans=0.0 2023-11-27 14:48:03,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.45 vs. limit=22.5 2023-11-27 14:48:04,101 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466000 2023-11-27 14:48:04,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3106620.0, ans=0.0 2023-11-27 14:48:10,431 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9100, loss[loss=0.09028, simple_loss=0.1171, pruned_loss=0.02423, audio_tagging_loss=0.007527, over 15455.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09116, pruned_loss=0.01289, audio_tagging_loss=0.008588, over 3057057.99 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:48:22,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.72 vs. limit=15.0 2023-11-27 14:48:36,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3106820.0, ans=0.04949747468305833 2023-11-27 14:49:00,955 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2023-11-27 14:49:03,727 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466050 2023-11-27 14:49:09,119 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9150, loss[loss=0.06671, simple_loss=0.08512, pruned_loss=0.01369, audio_tagging_loss=0.01047, over 14306.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09002, pruned_loss=0.01254, audio_tagging_loss=0.008642, over 3050554.58 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:49:14,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3107020.0, ans=0.125 2023-11-27 14:49:20,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3107086.6666666665, ans=0.125 2023-11-27 14:49:36,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3107153.3333333335, ans=0.2 2023-11-27 14:49:48,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3107220.0, ans=0.0 2023-11-27 14:49:57,785 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.571e+01 9.032e+01 9.849e+01 1.366e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-27 14:50:01,198 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466100 2023-11-27 14:50:06,687 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9200, loss[loss=0.05448, simple_loss=0.0697, pruned_loss=0.01036, audio_tagging_loss=0.009277, over 14369.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08967, pruned_loss=0.01253, audio_tagging_loss=0.008559, over 3050578.70 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:50:08,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3107353.3333333335, ans=0.125 2023-11-27 14:50:20,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3107420.0, ans=0.125 2023-11-27 14:50:30,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3107486.6666666665, ans=0.0 2023-11-27 14:50:52,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3107620.0, ans=0.0 2023-11-27 14:50:58,615 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466150 2023-11-27 14:51:04,612 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9250, loss[loss=0.07905, simple_loss=0.1086, pruned_loss=0.0145, audio_tagging_loss=0.01025, over 15237.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08978, pruned_loss=0.0125, audio_tagging_loss=0.008646, over 3052261.57 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:51:10,586 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.65 vs. limit=22.5 2023-11-27 14:51:18,518 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2023-11-27 14:51:21,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3107753.3333333335, ans=0.2 2023-11-27 14:51:36,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3107820.0, ans=0.125 2023-11-27 14:51:55,289 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 8.711e+01 9.233e+01 9.983e+01 1.314e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-27 14:51:57,603 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466200 2023-11-27 14:52:03,305 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9300, loss[loss=0.07312, simple_loss=0.1039, pruned_loss=0.01329, audio_tagging_loss=0.007886, over 15693.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09004, pruned_loss=0.01261, audio_tagging_loss=0.008713, over 3064188.20 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:52:15,481 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.34 vs. limit=10.0 2023-11-27 14:52:17,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3108086.6666666665, ans=0.1 2023-11-27 14:52:18,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3108086.6666666665, ans=0.0 2023-11-27 14:52:28,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3108153.3333333335, ans=0.125 2023-11-27 14:52:32,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3108153.3333333335, ans=0.125 2023-11-27 14:52:34,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3108153.3333333335, ans=0.125 2023-11-27 14:52:48,589 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:52:54,990 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466250 2023-11-27 14:53:00,963 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9350, loss[loss=0.0738, simple_loss=0.1071, pruned_loss=0.01298, audio_tagging_loss=0.007255, over 15293.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09006, pruned_loss=0.01257, audio_tagging_loss=0.008786, over 3060412.77 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:53:04,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3108353.3333333335, ans=0.07 2023-11-27 14:53:11,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3108420.0, ans=0.0 2023-11-27 14:53:19,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3108420.0, ans=0.125 2023-11-27 14:53:27,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3108486.6666666665, ans=0.1 2023-11-27 14:53:43,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3108553.3333333335, ans=0.0 2023-11-27 14:53:49,985 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.258e+01 8.694e+01 9.307e+01 9.983e+01 1.185e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 14:53:52,267 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466300 2023-11-27 14:53:58,192 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9400, loss[loss=0.06956, simple_loss=0.09031, pruned_loss=0.01754, audio_tagging_loss=0.006857, over 14700.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09053, pruned_loss=0.01265, audio_tagging_loss=0.00884, over 3067242.31 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:54:07,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3108686.6666666665, ans=0.0 2023-11-27 14:54:09,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3108753.3333333335, ans=0.2 2023-11-27 14:54:15,440 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2023-11-27 14:54:16,370 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.39 vs. limit=12.0 2023-11-27 14:54:51,340 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466350 2023-11-27 14:54:56,782 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9450, loss[loss=0.07085, simple_loss=0.1004, pruned_loss=0.01137, audio_tagging_loss=0.0093, over 15392.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.08996, pruned_loss=0.01261, audio_tagging_loss=0.008908, over 3061075.03 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:55:00,140 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 14:55:01,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3109020.0, ans=0.125 2023-11-27 14:55:11,504 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.71 vs. limit=15.0 2023-11-27 14:55:29,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3109153.3333333335, ans=0.2 2023-11-27 14:55:46,758 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.803e+01 9.221e+01 9.903e+01 1.293e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-27 14:55:49,066 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466400 2023-11-27 14:55:54,762 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9500, loss[loss=0.06166, simple_loss=0.08364, pruned_loss=0.01065, audio_tagging_loss=0.009193, over 14900.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09063, pruned_loss=0.01268, audio_tagging_loss=0.008923, over 3057752.01 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:56:47,021 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466450 2023-11-27 14:56:47,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3109620.0, ans=0.125 2023-11-27 14:56:52,523 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9550, loss[loss=0.07827, simple_loss=0.1164, pruned_loss=0.01415, audio_tagging_loss=0.005946, over 15307.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09121, pruned_loss=0.01293, audio_tagging_loss=0.008942, over 3059057.13 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:56:52,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3109686.6666666665, ans=0.0 2023-11-27 14:57:00,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3109686.6666666665, ans=0.125 2023-11-27 14:57:01,311 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.71 vs. limit=15.0 2023-11-27 14:57:09,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3109753.3333333335, ans=10.0 2023-11-27 14:57:38,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3109953.3333333335, ans=0.07 2023-11-27 14:57:42,556 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.757e+01 8.709e+01 9.251e+01 1.020e+02 1.249e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-27 14:57:45,238 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466500 2023-11-27 14:57:51,339 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9600, loss[loss=0.04144, simple_loss=0.05457, pruned_loss=0.004954, audio_tagging_loss=0.009199, over 14109.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.09101, pruned_loss=0.01281, audio_tagging_loss=0.008982, over 3053028.51 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:57:59,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.54 vs. limit=15.0 2023-11-27 14:58:00,344 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:58:16,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3110153.3333333335, ans=0.125 2023-11-27 14:58:23,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3110153.3333333335, ans=0.2 2023-11-27 14:58:42,784 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466550 2023-11-27 14:58:48,148 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9650, loss[loss=0.06544, simple_loss=0.08711, pruned_loss=0.0133, audio_tagging_loss=0.008581, over 15647.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09085, pruned_loss=0.01282, audio_tagging_loss=0.008938, over 3050080.08 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:58:48,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3110353.3333333335, ans=0.1 2023-11-27 14:58:50,836 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.64 vs. limit=15.0 2023-11-27 14:58:51,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3110353.3333333335, ans=0.0 2023-11-27 14:58:52,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3110353.3333333335, ans=0.0 2023-11-27 14:58:59,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3110420.0, ans=0.1 2023-11-27 14:59:07,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3110420.0, ans=0.125 2023-11-27 14:59:08,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3110420.0, ans=0.0 2023-11-27 14:59:34,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3110620.0, ans=0.2 2023-11-27 14:59:37,455 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.159e+01 8.987e+01 9.663e+01 1.056e+02 1.477e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-27 14:59:39,706 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466600 2023-11-27 14:59:46,071 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9700, loss[loss=0.06875, simple_loss=0.09271, pruned_loss=0.01267, audio_tagging_loss=0.009734, over 14710.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09118, pruned_loss=0.01277, audio_tagging_loss=0.008877, over 3056822.03 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:00:13,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3110820.0, ans=0.125 2023-11-27 15:00:28,359 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2023-11-27 15:00:29,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3110886.6666666665, ans=0.0 2023-11-27 15:00:30,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3110886.6666666665, ans=0.0 2023-11-27 15:00:38,197 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466650 2023-11-27 15:00:42,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3110953.3333333335, ans=0.125 2023-11-27 15:00:44,711 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9750, loss[loss=0.04835, simple_loss=0.06702, pruned_loss=0.005569, audio_tagging_loss=0.009269, over 16327.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09069, pruned_loss=0.01256, audio_tagging_loss=0.008808, over 3053544.51 frames. ], batch size: 64, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:00:49,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3111020.0, ans=0.015 2023-11-27 15:01:03,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3111086.6666666665, ans=0.04949747468305833 2023-11-27 15:01:07,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3111153.3333333335, ans=0.0 2023-11-27 15:01:20,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3111220.0, ans=0.125 2023-11-27 15:01:30,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3111286.6666666665, ans=10.0 2023-11-27 15:01:35,381 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.839e+01 8.575e+01 9.224e+01 9.953e+01 1.254e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-27 15:01:36,585 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466700 2023-11-27 15:01:37,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3111286.6666666665, ans=0.125 2023-11-27 15:01:41,860 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9800, loss[loss=0.05812, simple_loss=0.07152, pruned_loss=0.009813, audio_tagging_loss=0.01255, over 15250.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08999, pruned_loss=0.01251, audio_tagging_loss=0.008657, over 3049964.69 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:01:55,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.33 vs. limit=22.5 2023-11-27 15:01:57,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3111420.0, ans=0.125 2023-11-27 15:01:58,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3111420.0, ans=0.0 2023-11-27 15:02:01,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3111420.0, ans=0.0 2023-11-27 15:02:07,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3111486.6666666665, ans=0.125 2023-11-27 15:02:07,536 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:02:08,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3111486.6666666665, ans=0.125 2023-11-27 15:02:22,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3111553.3333333335, ans=0.125 2023-11-27 15:02:33,003 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466750 2023-11-27 15:02:36,216 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:02:38,367 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9850, loss[loss=0.1028, simple_loss=0.1402, pruned_loss=0.02671, audio_tagging_loss=0.005945, over 14891.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09116, pruned_loss=0.01269, audio_tagging_loss=0.008602, over 3050267.97 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:02:57,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3111753.3333333335, ans=0.125 2023-11-27 15:03:02,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3111820.0, ans=0.05 2023-11-27 15:03:11,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3111820.0, ans=0.125 2023-11-27 15:03:29,430 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.745e+01 9.208e+01 1.002e+02 1.325e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-27 15:03:30,695 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466800 2023-11-27 15:03:34,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3111953.3333333335, ans=0.125 2023-11-27 15:03:36,890 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9900, loss[loss=0.03978, simple_loss=0.04724, pruned_loss=0.005647, audio_tagging_loss=0.01051, over 15592.00 frames. ], tot_loss[loss=0.06749, simple_loss=0.09184, pruned_loss=0.01292, audio_tagging_loss=0.008656, over 3048416.26 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:03:43,672 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2023-11-27 15:03:54,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3112086.6666666665, ans=0.0 2023-11-27 15:04:03,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3112153.3333333335, ans=0.125 2023-11-27 15:04:05,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3112153.3333333335, ans=0.1 2023-11-27 15:04:23,795 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:04:25,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3112286.6666666665, ans=0.125 2023-11-27 15:04:29,182 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466850 2023-11-27 15:04:29,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3112286.6666666665, ans=0.125 2023-11-27 15:04:34,672 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9950, loss[loss=0.0546, simple_loss=0.07469, pruned_loss=0.01022, audio_tagging_loss=0.007028, over 15283.00 frames. ], tot_loss[loss=0.0679, simple_loss=0.09192, pruned_loss=0.01323, audio_tagging_loss=0.008706, over 3061113.44 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:04:47,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3112420.0, ans=0.125 2023-11-27 15:04:50,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3112420.0, ans=0.125 2023-11-27 15:04:51,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3112420.0, ans=0.0 2023-11-27 15:04:59,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3112486.6666666665, ans=0.0 2023-11-27 15:05:13,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3112553.3333333335, ans=0.1 2023-11-27 15:05:24,777 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 8.583e+01 9.133e+01 9.835e+01 1.182e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-27 15:05:25,948 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466900 2023-11-27 15:05:29,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3112620.0, ans=0.1 2023-11-27 15:05:31,341 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10000, loss[loss=0.08928, simple_loss=0.1214, pruned_loss=0.02209, audio_tagging_loss=0.006473, over 16181.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09011, pruned_loss=0.01295, audio_tagging_loss=0.008749, over 3058200.76 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:05:32,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3112686.6666666665, ans=0.05 2023-11-27 15:05:40,071 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.18 vs. limit=15.0 2023-11-27 15:05:41,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3112753.3333333335, ans=0.125 2023-11-27 15:05:53,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3112753.3333333335, ans=0.0 2023-11-27 15:05:58,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3112820.0, ans=0.125 2023-11-27 15:06:05,916 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.65 vs. limit=15.0 2023-11-27 15:06:12,228 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:06:23,646 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466950 2023-11-27 15:06:29,081 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10050, loss[loss=0.06302, simple_loss=0.08431, pruned_loss=0.01239, audio_tagging_loss=0.008469, over 14803.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09018, pruned_loss=0.01285, audio_tagging_loss=0.008692, over 3049619.65 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:06:39,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3113020.0, ans=0.2 2023-11-27 15:06:41,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3113086.6666666665, ans=0.04949747468305833 2023-11-27 15:06:42,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3113086.6666666665, ans=0.125 2023-11-27 15:07:07,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3113220.0, ans=0.0 2023-11-27 15:07:20,300 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 8.460e+01 9.017e+01 9.705e+01 1.338e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-27 15:07:21,483 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467000 2023-11-27 15:07:27,158 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10100, loss[loss=0.05025, simple_loss=0.07215, pruned_loss=0.004248, audio_tagging_loss=0.009933, over 14391.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08915, pruned_loss=0.0127, audio_tagging_loss=0.008831, over 3047172.85 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:07:39,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3113420.0, ans=0.125 2023-11-27 15:08:10,450 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.99 vs. limit=10.0 2023-11-27 15:08:13,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3113620.0, ans=0.125 2023-11-27 15:08:17,394 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:08:18,590 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467050 2023-11-27 15:08:21,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3113620.0, ans=0.125 2023-11-27 15:08:23,954 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10150, loss[loss=0.05314, simple_loss=0.06707, pruned_loss=0.0096, audio_tagging_loss=0.009999, over 14116.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08866, pruned_loss=0.01251, audio_tagging_loss=0.008949, over 3042611.83 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:08:43,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2023-11-27 15:08:55,798 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:08:57,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3113820.0, ans=0.125 2023-11-27 15:09:01,974 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.13 vs. limit=15.0 2023-11-27 15:09:05,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3113886.6666666665, ans=0.1 2023-11-27 15:09:14,318 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.492e+01 9.148e+01 9.869e+01 1.257e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-27 15:09:15,494 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467100 2023-11-27 15:09:18,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3113953.3333333335, ans=0.2 2023-11-27 15:09:20,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3114020.0, ans=0.125 2023-11-27 15:09:21,476 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10200, loss[loss=0.05499, simple_loss=0.07998, pruned_loss=0.009013, audio_tagging_loss=0.005992, over 14941.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08884, pruned_loss=0.01245, audio_tagging_loss=0.008981, over 3049351.73 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:09:25,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3114020.0, ans=0.125 2023-11-27 15:09:33,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3114086.6666666665, ans=0.125 2023-11-27 15:09:39,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3114086.6666666665, ans=0.1 2023-11-27 15:09:42,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3114086.6666666665, ans=0.1 2023-11-27 15:09:48,643 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:10:02,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3114220.0, ans=0.125 2023-11-27 15:10:06,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3114286.6666666665, ans=0.0 2023-11-27 15:10:10,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3114286.6666666665, ans=0.1 2023-11-27 15:10:14,286 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467150 2023-11-27 15:10:20,423 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10250, loss[loss=0.07478, simple_loss=0.0965, pruned_loss=0.01925, audio_tagging_loss=0.007282, over 15191.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.08975, pruned_loss=0.01279, audio_tagging_loss=0.009054, over 3050917.95 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:10:29,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3114353.3333333335, ans=10.0 2023-11-27 15:10:34,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3114420.0, ans=0.125 2023-11-27 15:10:41,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3114486.6666666665, ans=0.0 2023-11-27 15:10:43,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3114486.6666666665, ans=0.125 2023-11-27 15:10:49,835 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=15.0 2023-11-27 15:10:50,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3114486.6666666665, ans=0.1 2023-11-27 15:11:03,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3114553.3333333335, ans=0.125 2023-11-27 15:11:04,551 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2023-11-27 15:11:11,577 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.883e+01 9.540e+01 1.004e+02 1.419e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 15:11:11,674 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467200 2023-11-27 15:11:17,258 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10300, loss[loss=0.05974, simple_loss=0.07628, pruned_loss=0.01079, audio_tagging_loss=0.0108, over 15294.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.08981, pruned_loss=0.01261, audio_tagging_loss=0.009082, over 3050355.79 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:11:31,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3114753.3333333335, ans=0.1 2023-11-27 15:11:33,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3114753.3333333335, ans=0.125 2023-11-27 15:11:42,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3114820.0, ans=0.2 2023-11-27 15:11:45,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3114820.0, ans=0.0 2023-11-27 15:11:56,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3114886.6666666665, ans=0.95 2023-11-27 15:12:00,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3114886.6666666665, ans=0.125 2023-11-27 15:12:08,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3114953.3333333335, ans=0.125 2023-11-27 15:12:09,024 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467250 2023-11-27 15:12:11,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3114953.3333333335, ans=0.125 2023-11-27 15:12:14,911 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10350, loss[loss=0.06526, simple_loss=0.08241, pruned_loss=0.0137, audio_tagging_loss=0.01035, over 14702.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09023, pruned_loss=0.01273, audio_tagging_loss=0.009154, over 3049745.08 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:12:53,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3115220.0, ans=0.2 2023-11-27 15:13:03,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3115286.6666666665, ans=0.125 2023-11-27 15:13:05,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3115286.6666666665, ans=0.1 2023-11-27 15:13:07,650 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.393e+01 8.698e+01 9.408e+01 1.024e+02 1.336e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 15:13:07,751 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467300 2023-11-27 15:13:13,068 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10400, loss[loss=0.05644, simple_loss=0.07724, pruned_loss=0.009965, audio_tagging_loss=0.007854, over 14413.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08922, pruned_loss=0.01255, audio_tagging_loss=0.009194, over 3048692.50 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:13:22,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3115353.3333333335, ans=0.125 2023-11-27 15:13:36,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3115486.6666666665, ans=0.0 2023-11-27 15:13:40,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3115486.6666666665, ans=0.125 2023-11-27 15:13:40,472 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.77 vs. limit=15.0 2023-11-27 15:13:51,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3115553.3333333335, ans=0.025 2023-11-27 15:13:58,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3115620.0, ans=0.5 2023-11-27 15:14:05,107 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467350 2023-11-27 15:14:07,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3115620.0, ans=0.125 2023-11-27 15:14:10,454 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10450, loss[loss=0.06866, simple_loss=0.09176, pruned_loss=0.01443, audio_tagging_loss=0.008344, over 15654.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08905, pruned_loss=0.0126, audio_tagging_loss=0.009132, over 3049245.89 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:14:11,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3115686.6666666665, ans=0.0 2023-11-27 15:14:25,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3115753.3333333335, ans=0.125 2023-11-27 15:14:46,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3115886.6666666665, ans=0.0 2023-11-27 15:14:48,618 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.99 vs. limit=22.5 2023-11-27 15:14:52,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3115886.6666666665, ans=0.125 2023-11-27 15:15:02,072 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.347e+01 8.776e+01 9.506e+01 1.066e+02 3.679e+02, threshold=1.901e+02, percent-clipped=1.0 2023-11-27 15:15:02,186 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467400 2023-11-27 15:15:08,015 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10500, loss[loss=0.0648, simple_loss=0.08941, pruned_loss=0.01175, audio_tagging_loss=0.008343, over 14317.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08873, pruned_loss=0.01245, audio_tagging_loss=0.009006, over 3058901.82 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:15:25,368 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.19 vs. limit=15.0 2023-11-27 15:15:37,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3116153.3333333335, ans=0.125 2023-11-27 15:15:43,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3116220.0, ans=0.0 2023-11-27 15:15:43,665 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=15.0 2023-11-27 15:15:49,715 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:16:00,292 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467450 2023-11-27 15:16:06,294 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10550, loss[loss=0.06246, simple_loss=0.07645, pruned_loss=0.01532, audio_tagging_loss=0.008919, over 15252.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08845, pruned_loss=0.01244, audio_tagging_loss=0.008934, over 3050059.06 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:16:14,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3116353.3333333335, ans=0.125 2023-11-27 15:16:22,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3116420.0, ans=0.125 2023-11-27 15:16:27,232 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.15 vs. limit=15.0 2023-11-27 15:16:44,448 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.73 vs. limit=10.0 2023-11-27 15:16:57,561 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467500 2023-11-27 15:16:57,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3116620.0, ans=0.125 2023-11-27 15:16:57,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3116620.0, ans=0.125 2023-11-27 15:16:58,576 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.477e+01 8.619e+01 9.191e+01 9.903e+01 2.574e+02, threshold=1.838e+02, percent-clipped=2.0 2023-11-27 15:16:58,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3116620.0, ans=0.1 2023-11-27 15:17:01,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3116620.0, ans=0.07 2023-11-27 15:17:01,339 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.76 vs. limit=15.0 2023-11-27 15:17:03,604 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10600, loss[loss=0.04316, simple_loss=0.05406, pruned_loss=0.006438, audio_tagging_loss=0.009688, over 14208.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08876, pruned_loss=0.0125, audio_tagging_loss=0.008893, over 3047611.03 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:17:43,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3116886.6666666665, ans=0.0 2023-11-27 15:17:48,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3116953.3333333335, ans=0.125 2023-11-27 15:17:55,367 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467550 2023-11-27 15:17:58,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3116953.3333333335, ans=0.125 2023-11-27 15:17:58,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3116953.3333333335, ans=0.2 2023-11-27 15:18:00,714 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10650, loss[loss=0.0507, simple_loss=0.06684, pruned_loss=0.009035, audio_tagging_loss=0.008243, over 15378.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08895, pruned_loss=0.01252, audio_tagging_loss=0.008763, over 3047488.81 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:18:03,174 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:18:07,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3117020.0, ans=0.125 2023-11-27 15:18:10,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3117020.0, ans=0.125 2023-11-27 15:18:11,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3117086.6666666665, ans=0.0 2023-11-27 15:18:26,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3117153.3333333335, ans=0.2 2023-11-27 15:18:36,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3117220.0, ans=10.0 2023-11-27 15:18:36,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3117220.0, ans=0.125 2023-11-27 15:18:52,876 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467600 2023-11-27 15:18:55,395 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.520e+01 9.175e+01 9.888e+01 1.340e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-27 15:18:55,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3117286.6666666665, ans=0.0 2023-11-27 15:18:59,328 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10700, loss[loss=0.08042, simple_loss=0.1097, pruned_loss=0.01928, audio_tagging_loss=0.006306, over 14742.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08877, pruned_loss=0.01257, audio_tagging_loss=0.008781, over 3038939.06 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:19:05,429 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2023-11-27 15:19:11,090 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.91 vs. limit=22.5 2023-11-27 15:19:12,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3117420.0, ans=0.2 2023-11-27 15:19:12,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3117420.0, ans=0.125 2023-11-27 15:19:21,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3117486.6666666665, ans=0.2 2023-11-27 15:19:30,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3117486.6666666665, ans=0.125 2023-11-27 15:19:31,381 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.36 vs. limit=22.5 2023-11-27 15:19:37,035 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.93 vs. limit=15.0 2023-11-27 15:19:42,398 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:19:50,931 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467650 2023-11-27 15:19:56,286 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10750, loss[loss=0.04939, simple_loss=0.06077, pruned_loss=0.00948, audio_tagging_loss=0.009529, over 14024.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08911, pruned_loss=0.01254, audio_tagging_loss=0.008713, over 3041219.72 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:20:30,500 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2023-11-27 15:20:45,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3117953.3333333335, ans=0.1 2023-11-27 15:20:47,827 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467700 2023-11-27 15:20:49,904 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.438e+01 8.439e+01 9.244e+01 9.878e+01 1.512e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 15:20:53,264 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10800, loss[loss=0.05516, simple_loss=0.07913, pruned_loss=0.007086, audio_tagging_loss=0.008503, over 14215.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08944, pruned_loss=0.01269, audio_tagging_loss=0.00873, over 3038963.02 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:21:02,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3118020.0, ans=10.0 2023-11-27 15:21:09,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3118086.6666666665, ans=0.125 2023-11-27 15:21:27,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3118220.0, ans=0.1 2023-11-27 15:21:39,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3118286.6666666665, ans=0.125 2023-11-27 15:21:40,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3118286.6666666665, ans=0.125 2023-11-27 15:21:40,753 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2023-11-27 15:21:44,976 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467750 2023-11-27 15:21:50,812 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10850, loss[loss=0.05408, simple_loss=0.06004, pruned_loss=0.008886, audio_tagging_loss=0.01518, over 15237.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08898, pruned_loss=0.01253, audio_tagging_loss=0.008855, over 3043865.52 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:21:53,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3118353.3333333335, ans=0.1 2023-11-27 15:22:19,218 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.36 vs. limit=15.0 2023-11-27 15:22:43,101 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467800 2023-11-27 15:22:46,565 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.986e+01 9.693e+01 1.013e+02 1.433e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-27 15:22:48,723 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10900, loss[loss=0.07272, simple_loss=0.09653, pruned_loss=0.01529, audio_tagging_loss=0.009167, over 14417.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.08956, pruned_loss=0.01271, audio_tagging_loss=0.00888, over 3045380.57 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:22:49,824 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:22:59,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3118753.3333333335, ans=0.125 2023-11-27 15:23:06,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3118753.3333333335, ans=0.125 2023-11-27 15:23:09,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3118753.3333333335, ans=0.2 2023-11-27 15:23:27,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3118886.6666666665, ans=0.09899494936611666 2023-11-27 15:23:29,679 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.97 vs. limit=15.0 2023-11-27 15:23:34,088 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2023-11-27 15:23:38,448 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=22.5 2023-11-27 15:23:40,201 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467850 2023-11-27 15:23:45,537 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10950, loss[loss=0.06501, simple_loss=0.07907, pruned_loss=0.01476, audio_tagging_loss=0.01072, over 14581.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08945, pruned_loss=0.0127, audio_tagging_loss=0.008852, over 3041233.54 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:24:03,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3119086.6666666665, ans=0.1 2023-11-27 15:24:03,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3119086.6666666665, ans=0.0 2023-11-27 15:24:05,261 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.03 vs. limit=12.0 2023-11-27 15:24:21,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3119220.0, ans=0.125 2023-11-27 15:24:37,585 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467900 2023-11-27 15:24:40,678 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.952e+01 9.286e+01 1.000e+02 1.370e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 15:24:42,834 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11000, loss[loss=0.05718, simple_loss=0.07779, pruned_loss=0.009293, audio_tagging_loss=0.008988, over 16433.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.0893, pruned_loss=0.01252, audio_tagging_loss=0.008849, over 3048954.07 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:24:48,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3119353.3333333335, ans=0.125 2023-11-27 15:24:57,064 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:25:00,828 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.80 vs. limit=15.0 2023-11-27 15:25:10,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3119486.6666666665, ans=0.09899494936611666 2023-11-27 15:25:21,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3119553.3333333335, ans=0.2 2023-11-27 15:25:23,951 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.11 vs. limit=15.0 2023-11-27 15:25:24,762 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:25:27,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3119553.3333333335, ans=0.09899494936611666 2023-11-27 15:25:32,192 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.29 vs. limit=15.0 2023-11-27 15:25:35,472 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467950 2023-11-27 15:25:40,968 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11050, loss[loss=0.05982, simple_loss=0.07822, pruned_loss=0.009325, audio_tagging_loss=0.01139, over 15346.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.08945, pruned_loss=0.01273, audio_tagging_loss=0.008994, over 3049910.94 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:25:46,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3119686.6666666665, ans=0.0 2023-11-27 15:26:19,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3119886.6666666665, ans=0.1 2023-11-27 15:26:22,112 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:26:31,584 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468000 2023-11-27 15:26:32,928 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-468000.pt 2023-11-27 15:26:37,136 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.786e+01 9.414e+01 9.890e+01 1.526e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-27 15:26:39,332 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11100, loss[loss=0.05405, simple_loss=0.06382, pruned_loss=0.01077, audio_tagging_loss=0.01137, over 15975.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.08974, pruned_loss=0.01282, audio_tagging_loss=0.009121, over 3058512.71 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:26:53,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3120086.6666666665, ans=0.125 2023-11-27 15:27:24,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3120286.6666666665, ans=0.125 2023-11-27 15:27:26,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3120286.6666666665, ans=0.125 2023-11-27 15:27:30,487 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468050 2023-11-27 15:27:36,997 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11150, loss[loss=0.07193, simple_loss=0.0949, pruned_loss=0.01569, audio_tagging_loss=0.008782, over 15304.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09007, pruned_loss=0.01285, audio_tagging_loss=0.009171, over 3057598.27 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:27:54,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3120420.0, ans=0.125 2023-11-27 15:28:25,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3120620.0, ans=0.125 2023-11-27 15:28:29,081 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468100 2023-11-27 15:28:32,274 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.636e+01 8.619e+01 9.079e+01 9.995e+01 1.250e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-27 15:28:34,468 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11200, loss[loss=0.05315, simple_loss=0.0732, pruned_loss=0.008869, audio_tagging_loss=0.007678, over 16112.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08896, pruned_loss=0.01264, audio_tagging_loss=0.009171, over 3054349.95 frames. ], batch size: 62, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:28:36,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3120686.6666666665, ans=0.125 2023-11-27 15:28:39,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3120686.6666666665, ans=0.125 2023-11-27 15:28:51,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3120753.3333333335, ans=0.125 2023-11-27 15:28:53,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3120753.3333333335, ans=0.125 2023-11-27 15:29:02,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3120820.0, ans=0.2 2023-11-27 15:29:02,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3120820.0, ans=0.125 2023-11-27 15:29:05,590 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2023-11-27 15:29:18,821 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.56 vs. limit=10.0 2023-11-27 15:29:23,445 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.72 vs. limit=15.0 2023-11-27 15:29:25,882 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468150 2023-11-27 15:29:31,263 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11250, loss[loss=0.0505, simple_loss=0.06357, pruned_loss=0.008548, audio_tagging_loss=0.01017, over 14672.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08896, pruned_loss=0.0127, audio_tagging_loss=0.009183, over 3053115.63 frames. ], batch size: 57, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:29:31,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3121020.0, ans=0.0 2023-11-27 15:29:32,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3121020.0, ans=0.2 2023-11-27 15:29:52,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3121086.6666666665, ans=0.0 2023-11-27 15:30:01,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3121153.3333333335, ans=0.125 2023-11-27 15:30:13,456 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.51 vs. limit=12.0 2023-11-27 15:30:14,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3121220.0, ans=0.125 2023-11-27 15:30:22,716 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468200 2023-11-27 15:30:27,260 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.434e+01 8.697e+01 9.472e+01 1.045e+02 1.319e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-27 15:30:28,855 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11300, loss[loss=0.06943, simple_loss=0.09211, pruned_loss=0.01507, audio_tagging_loss=0.008303, over 15897.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08881, pruned_loss=0.01267, audio_tagging_loss=0.009119, over 3045133.09 frames. ], batch size: 60, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:30:32,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3121353.3333333335, ans=0.125 2023-11-27 15:30:43,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3121420.0, ans=0.0 2023-11-27 15:30:51,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3121486.6666666665, ans=0.1 2023-11-27 15:30:54,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3121486.6666666665, ans=0.1 2023-11-27 15:30:59,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3121486.6666666665, ans=0.1 2023-11-27 15:31:20,436 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468250 2023-11-27 15:31:26,457 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11350, loss[loss=0.06821, simple_loss=0.08626, pruned_loss=0.01594, audio_tagging_loss=0.009137, over 14438.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08924, pruned_loss=0.01267, audio_tagging_loss=0.008917, over 3047357.00 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:31:34,392 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2023-11-27 15:32:17,344 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468300 2023-11-27 15:32:18,807 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.01 vs. limit=10.0 2023-11-27 15:32:21,510 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 8.462e+01 9.162e+01 9.878e+01 1.221e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 15:32:22,618 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11400, loss[loss=0.0865, simple_loss=0.1179, pruned_loss=0.01813, audio_tagging_loss=0.009412, over 15521.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08917, pruned_loss=0.01275, audio_tagging_loss=0.008813, over 3046778.48 frames. ], batch size: 54, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:32:50,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3122153.3333333335, ans=0.125 2023-11-27 15:32:53,151 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.17 vs. limit=10.0 2023-11-27 15:33:00,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3122220.0, ans=0.2 2023-11-27 15:33:01,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3122220.0, ans=0.0 2023-11-27 15:33:02,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=15.0 2023-11-27 15:33:04,085 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2023-11-27 15:33:04,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3122220.0, ans=0.035 2023-11-27 15:33:13,418 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468350 2023-11-27 15:33:13,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3122286.6666666665, ans=0.1 2023-11-27 15:33:18,863 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11450, loss[loss=0.06803, simple_loss=0.09016, pruned_loss=0.0144, audio_tagging_loss=0.008552, over 15274.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08908, pruned_loss=0.01268, audio_tagging_loss=0.008752, over 3046382.67 frames. ], batch size: 59, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:33:51,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3122486.6666666665, ans=0.125 2023-11-27 15:33:53,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3122553.3333333335, ans=0.125 2023-11-27 15:34:03,445 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.85 vs. limit=15.0 2023-11-27 15:34:11,213 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468400 2023-11-27 15:34:11,742 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.23 vs. limit=22.5 2023-11-27 15:34:16,383 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.713e+01 9.522e+01 1.023e+02 1.434e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 15:34:17,530 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11500, loss[loss=0.05639, simple_loss=0.07526, pruned_loss=0.009352, audio_tagging_loss=0.009406, over 17038.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08925, pruned_loss=0.01268, audio_tagging_loss=0.008682, over 3052152.26 frames. ], batch size: 63, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:34:27,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3122686.6666666665, ans=0.0 2023-11-27 15:34:36,181 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.35 vs. limit=15.0 2023-11-27 15:34:36,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3122753.3333333335, ans=0.2 2023-11-27 15:35:07,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3122953.3333333335, ans=0.125 2023-11-27 15:35:09,562 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468450 2023-11-27 15:35:15,047 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11550, loss[loss=0.05439, simple_loss=0.06615, pruned_loss=0.008243, audio_tagging_loss=0.01307, over 15680.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08894, pruned_loss=0.01263, audio_tagging_loss=0.008736, over 3053845.66 frames. ], batch size: 61, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:35:22,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3123020.0, ans=0.125 2023-11-27 15:35:29,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3123086.6666666665, ans=0.125 2023-11-27 15:35:34,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3123086.6666666665, ans=0.125 2023-11-27 15:35:36,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3123153.3333333335, ans=0.0 2023-11-27 15:35:37,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3123153.3333333335, ans=0.125 2023-11-27 15:35:49,643 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.85 vs. limit=6.0 2023-11-27 15:35:54,361 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:35:58,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3123220.0, ans=0.125 2023-11-27 15:36:06,445 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468500 2023-11-27 15:36:10,705 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.756e+01 8.754e+01 9.552e+01 1.002e+02 2.038e+02, threshold=1.910e+02, percent-clipped=1.0 2023-11-27 15:36:11,831 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11600, loss[loss=0.07633, simple_loss=0.106, pruned_loss=0.01638, audio_tagging_loss=0.006935, over 16888.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08942, pruned_loss=0.01266, audio_tagging_loss=0.008771, over 3050957.89 frames. ], batch size: 64, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:36:13,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3123353.3333333335, ans=0.1 2023-11-27 15:36:19,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3123353.3333333335, ans=0.1 2023-11-27 15:36:20,442 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.23 vs. limit=15.0 2023-11-27 15:36:23,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2023-11-27 15:36:24,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3123420.0, ans=0.5 2023-11-27 15:36:25,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3123420.0, ans=0.0 2023-11-27 15:36:29,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3123420.0, ans=0.125 2023-11-27 15:36:33,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3123420.0, ans=0.04949747468305833 2023-11-27 15:36:48,302 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.48 vs. limit=15.0 2023-11-27 15:36:55,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3123553.3333333335, ans=0.0 2023-11-27 15:37:03,578 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468550 2023-11-27 15:37:09,519 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11650, loss[loss=0.08054, simple_loss=0.1106, pruned_loss=0.01382, audio_tagging_loss=0.01141, over 14459.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09054, pruned_loss=0.0129, audio_tagging_loss=0.008759, over 3046424.28 frames. ], batch size: 53, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:37:13,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3123686.6666666665, ans=0.0 2023-11-27 15:37:16,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3123686.6666666665, ans=0.125 2023-11-27 15:37:32,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3123820.0, ans=0.125 2023-11-27 15:37:59,711 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2023-11-27 15:38:01,354 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468600 2023-11-27 15:38:06,629 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.764e+01 9.242e+01 1.012e+02 1.452e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 15:38:06,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3124020.0, ans=0.1 2023-11-27 15:38:07,819 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11700, loss[loss=0.09559, simple_loss=0.1286, pruned_loss=0.02131, audio_tagging_loss=0.009961, over 13959.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.0906, pruned_loss=0.01287, audio_tagging_loss=0.008749, over 3035634.26 frames. ], batch size: 52, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:38:43,849 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.02 vs. limit=15.0 2023-11-27 15:38:49,572 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.33 vs. limit=15.0 2023-11-27 15:38:55,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3124286.6666666665, ans=0.125 2023-11-27 15:38:59,068 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468650 2023-11-27 15:39:04,393 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11750, loss[loss=0.0588, simple_loss=0.09203, pruned_loss=0.007569, audio_tagging_loss=0.005215, over 15592.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09065, pruned_loss=0.01277, audio_tagging_loss=0.008805, over 3043013.28 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:39:29,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3124486.6666666665, ans=0.125 2023-11-27 15:39:30,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.22 vs. limit=12.0 2023-11-27 15:39:33,664 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.87 vs. limit=22.5 2023-11-27 15:39:34,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3124486.6666666665, ans=0.125 2023-11-27 15:39:43,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3124553.3333333335, ans=0.2 2023-11-27 15:39:56,130 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468700 2023-11-27 15:40:00,308 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.652e+01 8.569e+01 9.104e+01 9.733e+01 1.192e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-27 15:40:01,943 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11800, loss[loss=0.0438, simple_loss=0.05919, pruned_loss=0.00429, audio_tagging_loss=0.009914, over 15463.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09021, pruned_loss=0.01266, audio_tagging_loss=0.008803, over 3046219.01 frames. ], batch size: 61, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:40:19,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.40 vs. limit=15.0 2023-11-27 15:40:27,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3124820.0, ans=0.0 2023-11-27 15:40:27,601 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.85 vs. limit=15.0 2023-11-27 15:40:28,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3124820.0, ans=0.1 2023-11-27 15:40:28,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3124820.0, ans=0.0 2023-11-27 15:40:30,960 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2023-11-27 15:40:44,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3124886.6666666665, ans=0.125 2023-11-27 15:40:44,486 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.60 vs. limit=10.0 2023-11-27 15:40:53,877 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468750 2023-11-27 15:40:58,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3125020.0, ans=0.0 2023-11-27 15:40:59,250 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11850, loss[loss=0.06191, simple_loss=0.07606, pruned_loss=0.01286, audio_tagging_loss=0.01102, over 14817.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09024, pruned_loss=0.01263, audio_tagging_loss=0.008891, over 3044220.24 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:41:13,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3125086.6666666665, ans=0.125 2023-11-27 15:41:50,334 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468800 2023-11-27 15:41:55,549 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.796e+01 8.519e+01 9.146e+01 9.837e+01 1.247e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-27 15:41:56,671 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11900, loss[loss=0.07835, simple_loss=0.1107, pruned_loss=0.01614, audio_tagging_loss=0.006845, over 15017.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09049, pruned_loss=0.01269, audio_tagging_loss=0.008981, over 3051099.03 frames. ], batch size: 54, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:42:32,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3125553.3333333335, ans=0.0 2023-11-27 15:42:45,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3125620.0, ans=0.125 2023-11-27 15:42:47,649 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468850 2023-11-27 15:42:53,434 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11950, loss[loss=0.04534, simple_loss=0.05926, pruned_loss=0.007018, audio_tagging_loss=0.008689, over 14219.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09079, pruned_loss=0.01277, audio_tagging_loss=0.008937, over 3047497.30 frames. ], batch size: 54, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:43:13,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3125753.3333333335, ans=0.125 2023-11-27 15:43:35,073 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.66 vs. limit=15.0 2023-11-27 15:43:42,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3125953.3333333335, ans=0.2 2023-11-27 15:43:44,396 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468900 2023-11-27 15:43:48,495 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.695e+01 9.240e+01 9.952e+01 1.274e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 15:43:49,172 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.16 vs. limit=15.0 2023-11-27 15:43:49,576 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 12000, loss[loss=0.05653, simple_loss=0.07656, pruned_loss=0.008619, audio_tagging_loss=0.009634, over 14387.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.0905, pruned_loss=0.01269, audio_tagging_loss=0.009066, over 3046554.88 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 32.0 2023-11-27 15:43:49,578 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 15:44:21,687 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.3649, 2.9400, 3.2312, 2.9614, 3.6643, 3.7237, 3.1943, 3.2451], device='cuda:0') 2023-11-27 15:44:24,030 INFO [train_asr.py:1267] (0/4) Epoch 39, validation: loss=0.05766, simple_loss=0.05064, pruned_loss=0.005162, audio_tagging_loss=0.02718, over 4681554.00 frames. 2023-11-27 15:44:24,030 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 15:44:33,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3126086.6666666665, ans=0.2 2023-11-27 15:44:34,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3126086.6666666665, ans=0.0 2023-11-27 15:44:37,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3126086.6666666665, ans=0.0 2023-11-27 15:44:39,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3126086.6666666665, ans=0.2 2023-11-27 15:44:42,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3126086.6666666665, ans=0.2 2023-11-27 15:44:43,357 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=12.0 2023-11-27 15:44:51,428 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-39.pt 2023-11-27 15:45:06,135 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 0, loss[loss=0.08068, simple_loss=0.08858, pruned_loss=0.01372, audio_tagging_loss=0.02267, over 15656.00 frames. ], tot_loss[loss=0.08068, simple_loss=0.08858, pruned_loss=0.01372, audio_tagging_loss=0.02267, over 15656.00 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 15:45:06,137 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 15:45:41,200 INFO [train_asr.py:1267] (0/4) Epoch 40, validation: loss=0.05772, simple_loss=0.0507, pruned_loss=0.005215, audio_tagging_loss=0.02715, over 4681554.00 frames. 2023-11-27 15:45:41,201 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 15:45:49,897 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2023-11-27 15:45:51,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3126253.3333333335, ans=0.0 2023-11-27 15:45:53,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=3126253.3333333335, ans=0.02 2023-11-27 15:45:54,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.42 vs. limit=10.0 2023-11-27 15:45:57,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3126253.3333333335, ans=0.0 2023-11-27 15:46:04,298 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468950 2023-11-27 15:46:35,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3126453.3333333335, ans=0.2 2023-11-27 15:46:39,233 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 50, loss[loss=0.07879, simple_loss=0.1021, pruned_loss=0.01427, audio_tagging_loss=0.01349, over 15437.00 frames. ], tot_loss[loss=0.07377, simple_loss=0.08808, pruned_loss=0.01214, audio_tagging_loss=0.01759, over 687234.47 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 15:46:53,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=3126586.6666666665, ans=0.1 2023-11-27 15:46:54,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3126586.6666666665, ans=0.1 2023-11-27 15:46:58,036 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.96 vs. limit=22.5 2023-11-27 15:47:01,952 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469000 2023-11-27 15:47:06,543 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.784e+01 9.209e+01 9.877e+01 1.086e+02 2.497e+02, threshold=1.975e+02, percent-clipped=1.0 2023-11-27 15:47:29,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3126786.6666666665, ans=0.05 2023-11-27 15:47:32,745 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=15.0 2023-11-27 15:47:33,835 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.92 vs. limit=22.5 2023-11-27 15:47:36,538 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 100, loss[loss=0.06461, simple_loss=0.07001, pruned_loss=0.01248, audio_tagging_loss=0.01712, over 15600.00 frames. ], tot_loss[loss=0.07283, simple_loss=0.08761, pruned_loss=0.01231, audio_tagging_loss=0.01671, over 1206855.60 frames. ], batch size: 61, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 15:47:37,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3126853.3333333335, ans=0.0 2023-11-27 15:47:45,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3126853.3333333335, ans=0.1 2023-11-27 15:47:47,667 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.50 vs. limit=12.0 2023-11-27 15:47:49,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3126920.0, ans=0.125 2023-11-27 15:48:00,243 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469050 2023-11-27 15:48:05,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3126986.6666666665, ans=0.2 2023-11-27 15:48:10,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3127053.3333333335, ans=0.125 2023-11-27 15:48:14,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3127053.3333333335, ans=0.125 2023-11-27 15:48:23,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3127120.0, ans=10.0 2023-11-27 15:48:28,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3127120.0, ans=0.125 2023-11-27 15:48:28,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3127120.0, ans=0.125 2023-11-27 15:48:34,256 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 150, loss[loss=0.08839, simple_loss=0.1166, pruned_loss=0.0207, audio_tagging_loss=0.009395, over 15962.00 frames. ], tot_loss[loss=0.07222, simple_loss=0.08952, pruned_loss=0.0128, audio_tagging_loss=0.01466, over 1611404.05 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 15:48:45,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.20 vs. limit=22.5 2023-11-27 15:48:56,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3127253.3333333335, ans=0.0 2023-11-27 15:48:58,108 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469100 2023-11-27 15:49:03,616 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.882e+01 9.129e+01 9.870e+01 1.058e+02 1.571e+02, threshold=1.974e+02, percent-clipped=0.0 2023-11-27 15:49:18,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3127386.6666666665, ans=0.0 2023-11-27 15:49:32,953 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 200, loss[loss=0.06267, simple_loss=0.08609, pruned_loss=0.01092, audio_tagging_loss=0.008709, over 15973.00 frames. ], tot_loss[loss=0.07179, simple_loss=0.09132, pruned_loss=0.01312, audio_tagging_loss=0.01301, over 1933514.54 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:49:33,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3127520.0, ans=0.125 2023-11-27 15:49:40,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3127520.0, ans=0.0 2023-11-27 15:49:45,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3127586.6666666665, ans=0.125 2023-11-27 15:49:55,024 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469150 2023-11-27 15:50:06,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3127720.0, ans=0.0 2023-11-27 15:50:29,994 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 250, loss[loss=0.06697, simple_loss=0.0954, pruned_loss=0.0131, audio_tagging_loss=0.006175, over 15796.00 frames. ], tot_loss[loss=0.07003, simple_loss=0.09035, pruned_loss=0.01296, audio_tagging_loss=0.01189, over 2177396.63 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:50:43,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3127920.0, ans=0.05 2023-11-27 15:50:45,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3127920.0, ans=0.0 2023-11-27 15:50:53,407 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469200 2023-11-27 15:50:59,103 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.262e+01 8.939e+01 9.454e+01 1.026e+02 1.364e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-27 15:51:06,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3128053.3333333335, ans=0.1 2023-11-27 15:51:16,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3128120.0, ans=0.125 2023-11-27 15:51:17,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3128120.0, ans=0.125 2023-11-27 15:51:19,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3128120.0, ans=0.04949747468305833 2023-11-27 15:51:26,663 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 300, loss[loss=0.05354, simple_loss=0.05344, pruned_loss=0.01107, audio_tagging_loss=0.01575, over 13876.00 frames. ], tot_loss[loss=0.06959, simple_loss=0.09116, pruned_loss=0.013, audio_tagging_loss=0.01101, over 2362197.55 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:51:32,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3128186.6666666665, ans=0.05 2023-11-27 15:51:37,448 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-11-27 15:51:50,361 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469250 2023-11-27 15:51:50,883 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.66 vs. limit=15.0 2023-11-27 15:52:01,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3128386.6666666665, ans=0.0 2023-11-27 15:52:24,503 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 350, loss[loss=0.05236, simple_loss=0.07003, pruned_loss=0.009926, audio_tagging_loss=0.007422, over 14182.00 frames. ], tot_loss[loss=0.06947, simple_loss=0.09224, pruned_loss=0.01309, audio_tagging_loss=0.01026, over 2518138.34 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:52:29,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3128520.0, ans=0.125 2023-11-27 15:52:36,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3128586.6666666665, ans=0.0 2023-11-27 15:52:38,883 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.48 vs. limit=15.0 2023-11-27 15:52:46,882 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469300 2023-11-27 15:52:52,151 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.928e+01 8.667e+01 9.273e+01 1.018e+02 1.811e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 15:53:14,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3128786.6666666665, ans=0.0 2023-11-27 15:53:15,849 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.45 vs. limit=10.0 2023-11-27 15:53:21,708 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 400, loss[loss=0.06485, simple_loss=0.09045, pruned_loss=0.01285, audio_tagging_loss=0.006773, over 16261.00 frames. ], tot_loss[loss=0.0687, simple_loss=0.09155, pruned_loss=0.01297, audio_tagging_loss=0.009955, over 2640081.22 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 15:53:22,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3128853.3333333335, ans=0.2 2023-11-27 15:53:22,315 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=22.5 2023-11-27 15:53:23,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3128853.3333333335, ans=0.125 2023-11-27 15:53:25,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3128853.3333333335, ans=0.2 2023-11-27 15:53:41,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3128920.0, ans=0.0 2023-11-27 15:53:44,323 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469350 2023-11-27 15:54:08,235 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.19 vs. limit=15.0 2023-11-27 15:54:13,718 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.99 vs. limit=22.5 2023-11-27 15:54:17,425 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 450, loss[loss=0.06788, simple_loss=0.09755, pruned_loss=0.008476, audio_tagging_loss=0.01063, over 15240.00 frames. ], tot_loss[loss=0.06777, simple_loss=0.0907, pruned_loss=0.01269, audio_tagging_loss=0.009737, over 2730712.59 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:54:20,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3129186.6666666665, ans=0.025 2023-11-27 15:54:41,504 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469400 2023-11-27 15:54:48,255 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.496e+01 8.585e+01 9.092e+01 1.003e+02 1.210e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-27 15:54:48,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3129320.0, ans=0.125 2023-11-27 15:55:16,569 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 500, loss[loss=0.067, simple_loss=0.08668, pruned_loss=0.01481, audio_tagging_loss=0.008855, over 15040.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09072, pruned_loss=0.01262, audio_tagging_loss=0.009371, over 2802880.03 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:55:25,542 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=15.0 2023-11-27 15:55:31,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3129586.6666666665, ans=0.0 2023-11-27 15:55:39,475 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469450 2023-11-27 15:55:41,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3129653.3333333335, ans=0.1 2023-11-27 15:55:48,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3129653.3333333335, ans=0.1 2023-11-27 15:56:04,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3129786.6666666665, ans=0.2 2023-11-27 15:56:14,250 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 550, loss[loss=0.07025, simple_loss=0.1031, pruned_loss=0.01244, audio_tagging_loss=0.006248, over 15083.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09029, pruned_loss=0.01258, audio_tagging_loss=0.009201, over 2860465.35 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:56:21,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3129853.3333333335, ans=0.0 2023-11-27 15:56:21,731 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.22 vs. limit=22.5 2023-11-27 15:56:30,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3129920.0, ans=0.125 2023-11-27 15:56:36,944 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469500 2023-11-27 15:56:44,646 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.960e+01 8.483e+01 9.154e+01 9.792e+01 1.177e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-27 15:56:55,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3130053.3333333335, ans=0.125 2023-11-27 15:57:06,611 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2023-11-27 15:57:11,342 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 600, loss[loss=0.06013, simple_loss=0.07327, pruned_loss=0.01298, audio_tagging_loss=0.01052, over 14943.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09031, pruned_loss=0.01258, audio_tagging_loss=0.009031, over 2897471.16 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:57:21,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3130186.6666666665, ans=0.2 2023-11-27 15:57:23,907 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=22.5 2023-11-27 15:57:35,688 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469550 2023-11-27 15:57:47,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3130386.6666666665, ans=0.1 2023-11-27 15:57:47,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3130386.6666666665, ans=0.1 2023-11-27 15:57:55,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3130386.6666666665, ans=0.05 2023-11-27 15:57:58,133 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2023-11-27 15:58:02,869 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.90 vs. limit=15.0 2023-11-27 15:58:09,181 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 650, loss[loss=0.08181, simple_loss=0.1139, pruned_loss=0.01771, audio_tagging_loss=0.007136, over 15700.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09093, pruned_loss=0.01273, audio_tagging_loss=0.009002, over 2930891.09 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:58:11,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3130520.0, ans=0.2 2023-11-27 15:58:16,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3130520.0, ans=0.1 2023-11-27 15:58:22,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3130586.6666666665, ans=0.1 2023-11-27 15:58:32,565 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469600 2023-11-27 15:58:40,339 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 8.671e+01 9.149e+01 9.953e+01 1.299e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-27 15:59:00,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3130786.6666666665, ans=0.0 2023-11-27 15:59:03,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3130786.6666666665, ans=0.125 2023-11-27 15:59:07,567 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 700, loss[loss=0.06856, simple_loss=0.09395, pruned_loss=0.01279, audio_tagging_loss=0.008796, over 15609.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09139, pruned_loss=0.01266, audio_tagging_loss=0.008989, over 2966541.13 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 8.0 2023-11-27 15:59:09,353 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.47 vs. limit=22.5 2023-11-27 15:59:13,720 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.19 vs. limit=15.0 2023-11-27 15:59:16,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3130853.3333333335, ans=0.125 2023-11-27 15:59:18,281 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.25 vs. limit=12.0 2023-11-27 15:59:30,002 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469650 2023-11-27 16:00:01,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3131120.0, ans=0.125 2023-11-27 16:00:05,239 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 750, loss[loss=0.06996, simple_loss=0.1007, pruned_loss=0.01109, audio_tagging_loss=0.008516, over 14182.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09159, pruned_loss=0.01265, audio_tagging_loss=0.009002, over 2984121.58 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 8.0 2023-11-27 16:00:09,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3131186.6666666665, ans=0.025 2023-11-27 16:00:28,340 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469700 2023-11-27 16:00:28,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3131320.0, ans=0.04949747468305833 2023-11-27 16:00:36,959 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.681e+01 9.396e+01 9.945e+01 1.193e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 16:00:44,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3131386.6666666665, ans=0.5 2023-11-27 16:00:46,301 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2023-11-27 16:00:59,913 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2023-11-27 16:01:03,251 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 800, loss[loss=0.05059, simple_loss=0.06182, pruned_loss=0.007047, audio_tagging_loss=0.01264, over 14759.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09066, pruned_loss=0.01258, audio_tagging_loss=0.009101, over 2991629.86 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:01:06,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3131520.0, ans=0.05 2023-11-27 16:01:26,308 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469750 2023-11-27 16:01:28,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3131653.3333333335, ans=0.125 2023-11-27 16:01:36,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3131720.0, ans=0.125 2023-11-27 16:01:39,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3131720.0, ans=0.125 2023-11-27 16:01:53,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3131786.6666666665, ans=0.125 2023-11-27 16:02:00,639 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 850, loss[loss=0.05739, simple_loss=0.08547, pruned_loss=0.006103, audio_tagging_loss=0.008548, over 15593.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09068, pruned_loss=0.01255, audio_tagging_loss=0.009146, over 2998728.09 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:02:22,680 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469800 2023-11-27 16:02:31,532 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 8.724e+01 9.421e+01 1.007e+02 1.369e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-27 16:02:35,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3132053.3333333335, ans=0.125 2023-11-27 16:02:36,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3132053.3333333335, ans=0.1 2023-11-27 16:02:38,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3132053.3333333335, ans=0.125 2023-11-27 16:02:42,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3132053.3333333335, ans=0.0 2023-11-27 16:02:46,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3132120.0, ans=0.125 2023-11-27 16:02:50,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3132120.0, ans=0.2 2023-11-27 16:02:51,754 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=12.0 2023-11-27 16:02:57,918 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 900, loss[loss=0.09527, simple_loss=0.1346, pruned_loss=0.02117, audio_tagging_loss=0.006785, over 15392.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.08987, pruned_loss=0.01255, audio_tagging_loss=0.009233, over 3008675.44 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:03:11,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3132253.3333333335, ans=0.2 2023-11-27 16:03:18,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3132253.3333333335, ans=0.125 2023-11-27 16:03:20,897 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469850 2023-11-27 16:03:48,596 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:03:55,285 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 950, loss[loss=0.0663, simple_loss=0.09774, pruned_loss=0.009833, audio_tagging_loss=0.007595, over 15511.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08958, pruned_loss=0.01247, audio_tagging_loss=0.009132, over 3016530.86 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:04:19,217 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469900 2023-11-27 16:04:26,961 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.703e+01 9.559e+01 1.057e+02 1.419e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 16:04:36,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=3132720.0, ans=12.0 2023-11-27 16:04:39,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3132720.0, ans=0.2 2023-11-27 16:04:49,443 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.10 vs. limit=15.0 2023-11-27 16:04:53,157 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1000, loss[loss=0.06874, simple_loss=0.1008, pruned_loss=0.01348, audio_tagging_loss=0.004858, over 15027.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.0888, pruned_loss=0.01237, audio_tagging_loss=0.008982, over 3014347.39 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:05:14,234 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:05:16,395 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469950 2023-11-27 16:05:18,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3132986.6666666665, ans=15.0 2023-11-27 16:05:20,835 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:05:26,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3132986.6666666665, ans=0.2 2023-11-27 16:05:38,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3133120.0, ans=0.1 2023-11-27 16:05:51,391 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1050, loss[loss=0.06311, simple_loss=0.08025, pruned_loss=0.01119, audio_tagging_loss=0.0118, over 14973.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08874, pruned_loss=0.01238, audio_tagging_loss=0.008815, over 3015264.17 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:05:51,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3133186.6666666665, ans=0.1 2023-11-27 16:06:06,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3133253.3333333335, ans=0.125 2023-11-27 16:06:07,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3133253.3333333335, ans=0.1 2023-11-27 16:06:14,245 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470000 2023-11-27 16:06:14,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.03 vs. limit=15.0 2023-11-27 16:06:20,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3133320.0, ans=0.125 2023-11-27 16:06:22,107 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.934e+01 9.738e+01 1.038e+02 1.396e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 16:06:33,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3133386.6666666665, ans=0.125 2023-11-27 16:06:48,750 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1100, loss[loss=0.05371, simple_loss=0.07503, pruned_loss=0.007467, audio_tagging_loss=0.008734, over 15246.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08861, pruned_loss=0.01228, audio_tagging_loss=0.008766, over 3021647.32 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:06:52,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3133520.0, ans=0.125 2023-11-27 16:06:55,085 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:06:56,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3133520.0, ans=0.0 2023-11-27 16:07:00,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3133586.6666666665, ans=0.125 2023-11-27 16:07:06,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3133586.6666666665, ans=0.125 2023-11-27 16:07:12,242 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470050 2023-11-27 16:07:17,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3133653.3333333335, ans=0.125 2023-11-27 16:07:46,816 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1150, loss[loss=0.07609, simple_loss=0.105, pruned_loss=0.014, audio_tagging_loss=0.009568, over 15653.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08953, pruned_loss=0.0123, audio_tagging_loss=0.008751, over 3022742.78 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:07:49,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3133853.3333333335, ans=0.125 2023-11-27 16:07:52,324 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.75 vs. limit=10.0 2023-11-27 16:07:54,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3133853.3333333335, ans=0.0 2023-11-27 16:07:59,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3133920.0, ans=0.1 2023-11-27 16:08:10,140 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470100 2023-11-27 16:08:12,883 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.76 vs. limit=15.0 2023-11-27 16:08:16,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3133986.6666666665, ans=10.0 2023-11-27 16:08:18,076 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.166e+01 8.606e+01 9.243e+01 9.874e+01 1.339e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 16:08:19,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3133986.6666666665, ans=0.0 2023-11-27 16:08:25,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3134053.3333333335, ans=0.125 2023-11-27 16:08:26,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3134053.3333333335, ans=0.025 2023-11-27 16:08:30,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3134053.3333333335, ans=0.04949747468305833 2023-11-27 16:08:40,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3134120.0, ans=0.125 2023-11-27 16:08:44,536 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1200, loss[loss=0.06735, simple_loss=0.0913, pruned_loss=0.01327, audio_tagging_loss=0.008426, over 15017.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08938, pruned_loss=0.01243, audio_tagging_loss=0.008757, over 3020462.88 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:09:02,703 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:09:08,375 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470150 2023-11-27 16:09:21,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3134386.6666666665, ans=0.125 2023-11-27 16:09:27,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3134386.6666666665, ans=0.125 2023-11-27 16:09:28,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3134386.6666666665, ans=0.125 2023-11-27 16:09:28,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3134386.6666666665, ans=0.125 2023-11-27 16:09:33,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3134453.3333333335, ans=0.0 2023-11-27 16:09:42,469 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1250, loss[loss=0.07548, simple_loss=0.1013, pruned_loss=0.01818, audio_tagging_loss=0.006668, over 14746.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08938, pruned_loss=0.01236, audio_tagging_loss=0.00868, over 3017985.01 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:09:49,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3134520.0, ans=0.0 2023-11-27 16:10:01,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3134586.6666666665, ans=0.125 2023-11-27 16:10:02,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3134586.6666666665, ans=0.125 2023-11-27 16:10:05,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3134653.3333333335, ans=0.0 2023-11-27 16:10:06,042 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470200 2023-11-27 16:10:06,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3134653.3333333335, ans=0.125 2023-11-27 16:10:14,027 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.573e+01 9.266e+01 9.865e+01 1.338e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-27 16:10:27,058 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=12.0 2023-11-27 16:10:40,913 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1300, loss[loss=0.06546, simple_loss=0.0935, pruned_loss=0.01068, audio_tagging_loss=0.008028, over 16670.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08883, pruned_loss=0.01231, audio_tagging_loss=0.008649, over 3029087.02 frames. ], batch size: 63, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:11:03,373 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470250 2023-11-27 16:11:10,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3134986.6666666665, ans=0.2 2023-11-27 16:11:10,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3134986.6666666665, ans=0.2 2023-11-27 16:11:27,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3135120.0, ans=0.0 2023-11-27 16:11:29,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3135120.0, ans=0.125 2023-11-27 16:11:38,503 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1350, loss[loss=0.07075, simple_loss=0.106, pruned_loss=0.01052, audio_tagging_loss=0.007225, over 15095.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09001, pruned_loss=0.01244, audio_tagging_loss=0.008546, over 3033316.34 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:11:42,367 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.60 vs. limit=22.5 2023-11-27 16:12:01,843 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470300 2023-11-27 16:12:04,889 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.72 vs. limit=22.5 2023-11-27 16:12:09,982 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.641e+01 9.240e+01 9.975e+01 1.416e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 16:12:23,887 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:12:30,130 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=15.0 2023-11-27 16:12:32,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3135453.3333333335, ans=0.1 2023-11-27 16:12:36,566 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1400, loss[loss=0.0628, simple_loss=0.0902, pruned_loss=0.008721, audio_tagging_loss=0.008983, over 16007.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.0906, pruned_loss=0.01257, audio_tagging_loss=0.008566, over 3038657.63 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:12:57,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3135586.6666666665, ans=10.0 2023-11-27 16:12:58,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3135653.3333333335, ans=0.2 2023-11-27 16:12:59,833 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470350 2023-11-27 16:13:17,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3135720.0, ans=0.125 2023-11-27 16:13:28,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3135786.6666666665, ans=0.125 2023-11-27 16:13:34,926 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1450, loss[loss=0.07241, simple_loss=0.09599, pruned_loss=0.01464, audio_tagging_loss=0.009782, over 15774.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09051, pruned_loss=0.01265, audio_tagging_loss=0.008695, over 3045294.11 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:13:51,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3135920.0, ans=0.125 2023-11-27 16:13:52,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3135920.0, ans=0.125 2023-11-27 16:13:57,720 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470400 2023-11-27 16:14:05,668 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.278e+01 8.750e+01 9.513e+01 1.013e+02 1.289e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-27 16:14:09,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=3136053.3333333335, ans=0.2 2023-11-27 16:14:16,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3136053.3333333335, ans=0.125 2023-11-27 16:14:32,819 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1500, loss[loss=0.06672, simple_loss=0.07431, pruned_loss=0.01799, audio_tagging_loss=0.01157, over 14324.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09066, pruned_loss=0.01276, audio_tagging_loss=0.008811, over 3042635.07 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:14:41,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3136186.6666666665, ans=0.125 2023-11-27 16:14:48,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3136253.3333333335, ans=0.125 2023-11-27 16:14:56,081 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470450 2023-11-27 16:15:06,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3136386.6666666665, ans=0.125 2023-11-27 16:15:08,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3136386.6666666665, ans=0.0 2023-11-27 16:15:27,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3136453.3333333335, ans=0.125 2023-11-27 16:15:30,267 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1550, loss[loss=0.07389, simple_loss=0.09227, pruned_loss=0.01781, audio_tagging_loss=0.00995, over 15628.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.08985, pruned_loss=0.01265, audio_tagging_loss=0.008998, over 3043254.70 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:15:33,256 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=12.0 2023-11-27 16:15:40,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3136520.0, ans=0.125 2023-11-27 16:15:40,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3136520.0, ans=0.1 2023-11-27 16:15:53,787 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470500 2023-11-27 16:16:02,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3136653.3333333335, ans=0.125 2023-11-27 16:16:03,048 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 8.701e+01 9.341e+01 1.017e+02 1.377e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-27 16:16:03,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3136653.3333333335, ans=10.0 2023-11-27 16:16:05,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3136720.0, ans=0.2 2023-11-27 16:16:08,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3136720.0, ans=0.125 2023-11-27 16:16:10,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3136720.0, ans=0.2 2023-11-27 16:16:13,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.74 vs. limit=15.0 2023-11-27 16:16:16,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3136786.6666666665, ans=0.05 2023-11-27 16:16:16,797 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.73 vs. limit=15.0 2023-11-27 16:16:21,919 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.08 vs. limit=15.0 2023-11-27 16:16:24,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3136786.6666666665, ans=0.2 2023-11-27 16:16:28,063 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1600, loss[loss=0.05113, simple_loss=0.07054, pruned_loss=0.007006, audio_tagging_loss=0.008854, over 15068.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08902, pruned_loss=0.01248, audio_tagging_loss=0.009126, over 3039626.78 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:16:32,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3136853.3333333335, ans=0.125 2023-11-27 16:16:49,456 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2023-11-27 16:16:50,944 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470550 2023-11-27 16:16:52,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3136986.6666666665, ans=0.125 2023-11-27 16:17:02,827 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=15.0 2023-11-27 16:17:06,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3137053.3333333335, ans=0.125 2023-11-27 16:17:19,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3137120.0, ans=0.0 2023-11-27 16:17:26,221 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1650, loss[loss=0.05688, simple_loss=0.07565, pruned_loss=0.01058, audio_tagging_loss=0.008473, over 15178.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.0897, pruned_loss=0.01257, audio_tagging_loss=0.009174, over 3039081.79 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:17:31,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3137186.6666666665, ans=0.2 2023-11-27 16:17:34,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3137186.6666666665, ans=0.2 2023-11-27 16:17:48,395 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470600 2023-11-27 16:17:59,001 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 8.945e+01 9.413e+01 1.026e+02 1.249e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-27 16:18:09,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3137386.6666666665, ans=0.125 2023-11-27 16:18:23,912 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1700, loss[loss=0.05249, simple_loss=0.07069, pruned_loss=0.0101, audio_tagging_loss=0.007049, over 15835.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.08995, pruned_loss=0.01264, audio_tagging_loss=0.009219, over 3041649.95 frames. ], batch size: 61, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:18:31,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3137520.0, ans=0.125 2023-11-27 16:18:37,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3137586.6666666665, ans=0.2 2023-11-27 16:18:40,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3137586.6666666665, ans=0.125 2023-11-27 16:18:44,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3137586.6666666665, ans=0.125 2023-11-27 16:18:46,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3137653.3333333335, ans=0.125 2023-11-27 16:18:47,181 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470650 2023-11-27 16:18:49,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3137653.3333333335, ans=0.1 2023-11-27 16:18:50,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3137653.3333333335, ans=0.07 2023-11-27 16:18:55,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3137653.3333333335, ans=0.125 2023-11-27 16:18:55,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3137653.3333333335, ans=0.125 2023-11-27 16:18:57,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3137720.0, ans=0.125 2023-11-27 16:19:00,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3137720.0, ans=0.125 2023-11-27 16:19:21,611 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1750, loss[loss=0.06548, simple_loss=0.09217, pruned_loss=0.009605, audio_tagging_loss=0.009791, over 15774.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.08989, pruned_loss=0.01264, audio_tagging_loss=0.009039, over 3042611.44 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:19:32,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3137920.0, ans=0.0 2023-11-27 16:19:32,593 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.52 vs. limit=22.5 2023-11-27 16:19:41,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3137920.0, ans=0.1 2023-11-27 16:19:44,793 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470700 2023-11-27 16:19:54,555 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.663e+01 9.086e+01 9.649e+01 1.198e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-27 16:19:58,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3138053.3333333335, ans=0.0 2023-11-27 16:20:19,436 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1800, loss[loss=0.07259, simple_loss=0.09979, pruned_loss=0.01501, audio_tagging_loss=0.007693, over 14938.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.08969, pruned_loss=0.01273, audio_tagging_loss=0.008968, over 3040545.11 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:20:41,848 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470750 2023-11-27 16:20:42,352 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.32 vs. limit=22.5 2023-11-27 16:20:47,152 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.76 vs. limit=10.0 2023-11-27 16:20:50,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3138320.0, ans=0.0 2023-11-27 16:20:51,119 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2023-11-27 16:20:53,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3138386.6666666665, ans=0.0 2023-11-27 16:20:55,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3138386.6666666665, ans=0.125 2023-11-27 16:21:03,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3138386.6666666665, ans=0.0 2023-11-27 16:21:06,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3138453.3333333335, ans=0.0 2023-11-27 16:21:16,707 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1850, loss[loss=0.05408, simple_loss=0.06669, pruned_loss=0.0115, audio_tagging_loss=0.009239, over 15527.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08982, pruned_loss=0.01265, audio_tagging_loss=0.008813, over 3042444.42 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:21:19,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3138520.0, ans=0.125 2023-11-27 16:21:20,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3138520.0, ans=0.07 2023-11-27 16:21:40,221 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470800 2023-11-27 16:21:50,843 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.385e+01 8.620e+01 9.187e+01 9.919e+01 1.245e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-27 16:21:51,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3138720.0, ans=0.125 2023-11-27 16:21:53,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3138720.0, ans=0.0 2023-11-27 16:21:54,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3138720.0, ans=0.125 2023-11-27 16:22:01,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3138720.0, ans=0.125 2023-11-27 16:22:06,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3138786.6666666665, ans=0.125 2023-11-27 16:22:14,811 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1900, loss[loss=0.05114, simple_loss=0.06844, pruned_loss=0.01123, audio_tagging_loss=0.005688, over 14035.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08943, pruned_loss=0.01255, audio_tagging_loss=0.008759, over 3039751.98 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:22:23,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3138853.3333333335, ans=0.1 2023-11-27 16:22:38,591 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470850 2023-11-27 16:22:45,593 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.03 vs. limit=10.0 2023-11-27 16:22:55,223 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:22:58,995 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:23:12,684 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1950, loss[loss=0.07593, simple_loss=0.08859, pruned_loss=0.01893, audio_tagging_loss=0.01271, over 14879.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08933, pruned_loss=0.01267, audio_tagging_loss=0.008808, over 3041834.00 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:23:33,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3139253.3333333335, ans=0.125 2023-11-27 16:23:35,698 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470900 2023-11-27 16:23:37,304 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=12.0 2023-11-27 16:23:46,453 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.170e+01 8.576e+01 9.160e+01 9.774e+01 1.352e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 16:23:46,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3139386.6666666665, ans=0.125 2023-11-27 16:23:51,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3139386.6666666665, ans=0.2 2023-11-27 16:24:01,352 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=15.0 2023-11-27 16:24:10,837 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2000, loss[loss=0.06133, simple_loss=0.07619, pruned_loss=0.01211, audio_tagging_loss=0.01113, over 15676.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08953, pruned_loss=0.0127, audio_tagging_loss=0.008753, over 3043487.68 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:24:15,973 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.48 vs. limit=15.0 2023-11-27 16:24:31,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3139586.6666666665, ans=0.1 2023-11-27 16:24:33,690 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470950 2023-11-27 16:24:34,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3139653.3333333335, ans=0.07 2023-11-27 16:24:48,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3139720.0, ans=0.0 2023-11-27 16:24:52,013 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.98 vs. limit=15.0 2023-11-27 16:24:52,737 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:25:02,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3139786.6666666665, ans=0.07 2023-11-27 16:25:04,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3139786.6666666665, ans=0.125 2023-11-27 16:25:07,720 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2050, loss[loss=0.08839, simple_loss=0.1202, pruned_loss=0.01631, audio_tagging_loss=0.012, over 14905.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.08993, pruned_loss=0.01281, audio_tagging_loss=0.008746, over 3039070.58 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:25:14,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3139853.3333333335, ans=0.1 2023-11-27 16:25:14,629 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=22.5 2023-11-27 16:25:17,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3139853.3333333335, ans=0.2 2023-11-27 16:25:31,911 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471000 2023-11-27 16:25:39,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.12 vs. limit=10.0 2023-11-27 16:25:41,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3139986.6666666665, ans=0.2 2023-11-27 16:25:41,988 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.142e+01 8.831e+01 9.330e+01 1.029e+02 1.229e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-27 16:25:48,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3140053.3333333335, ans=0.0 2023-11-27 16:25:52,400 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2023-11-27 16:25:54,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3140120.0, ans=0.0 2023-11-27 16:26:05,845 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2100, loss[loss=0.05714, simple_loss=0.07777, pruned_loss=0.009971, audio_tagging_loss=0.008283, over 15084.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08934, pruned_loss=0.01275, audio_tagging_loss=0.008736, over 3038988.17 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:26:12,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3140186.6666666665, ans=0.125 2023-11-27 16:26:16,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3140186.6666666665, ans=0.125 2023-11-27 16:26:17,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3140253.3333333335, ans=0.2 2023-11-27 16:26:21,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3140253.3333333335, ans=0.125 2023-11-27 16:26:29,095 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471050 2023-11-27 16:26:30,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3140320.0, ans=0.2 2023-11-27 16:26:39,932 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.94 vs. limit=22.5 2023-11-27 16:26:53,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3140453.3333333335, ans=0.0 2023-11-27 16:26:57,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3140453.3333333335, ans=0.125 2023-11-27 16:27:03,648 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2150, loss[loss=0.06192, simple_loss=0.07771, pruned_loss=0.01228, audio_tagging_loss=0.01078, over 14527.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08933, pruned_loss=0.01275, audio_tagging_loss=0.008718, over 3041365.39 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:27:13,804 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:27:20,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3140586.6666666665, ans=0.2 2023-11-27 16:27:27,032 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471100 2023-11-27 16:27:35,269 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=15.0 2023-11-27 16:27:36,628 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.778e+01 9.339e+01 1.008e+02 1.312e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-27 16:27:42,464 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:27:42,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3140720.0, ans=0.0 2023-11-27 16:27:43,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3140720.0, ans=0.125 2023-11-27 16:27:49,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3140786.6666666665, ans=0.1 2023-11-27 16:27:52,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3140786.6666666665, ans=0.1 2023-11-27 16:28:01,139 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2200, loss[loss=0.0847, simple_loss=0.112, pruned_loss=0.02035, audio_tagging_loss=0.008356, over 15370.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.08994, pruned_loss=0.0128, audio_tagging_loss=0.008637, over 3047419.64 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:28:17,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3140920.0, ans=0.125 2023-11-27 16:28:19,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3140920.0, ans=0.125 2023-11-27 16:28:24,826 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471150 2023-11-27 16:28:42,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3141053.3333333335, ans=0.2 2023-11-27 16:28:45,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3141053.3333333335, ans=0.0 2023-11-27 16:28:45,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3141053.3333333335, ans=0.125 2023-11-27 16:28:53,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3141120.0, ans=0.0 2023-11-27 16:28:55,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3141120.0, ans=0.125 2023-11-27 16:28:58,791 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2250, loss[loss=0.07391, simple_loss=0.09895, pruned_loss=0.01436, audio_tagging_loss=0.01008, over 15227.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09042, pruned_loss=0.01274, audio_tagging_loss=0.008665, over 3040901.39 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:29:11,544 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.86 vs. limit=10.0 2023-11-27 16:29:15,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3141253.3333333335, ans=0.1 2023-11-27 16:29:20,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3141320.0, ans=0.125 2023-11-27 16:29:21,756 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471200 2023-11-27 16:29:33,586 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.247e+01 8.710e+01 9.342e+01 1.003e+02 1.212e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-27 16:29:37,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3141386.6666666665, ans=0.1 2023-11-27 16:29:51,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3141453.3333333335, ans=0.125 2023-11-27 16:29:57,620 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2300, loss[loss=0.0709, simple_loss=0.1053, pruned_loss=0.01155, audio_tagging_loss=0.006682, over 14960.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09014, pruned_loss=0.01271, audio_tagging_loss=0.008716, over 3041582.97 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:30:07,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3141586.6666666665, ans=0.1 2023-11-27 16:30:08,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3141586.6666666665, ans=0.125 2023-11-27 16:30:13,129 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:30:19,827 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471250 2023-11-27 16:30:33,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3141720.0, ans=0.0 2023-11-27 16:30:42,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3141786.6666666665, ans=0.2 2023-11-27 16:30:51,103 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:30:54,428 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2350, loss[loss=0.07534, simple_loss=0.1045, pruned_loss=0.01568, audio_tagging_loss=0.007403, over 14658.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.08986, pruned_loss=0.01256, audio_tagging_loss=0.008783, over 3041255.91 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:31:02,108 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.38 vs. limit=10.0 2023-11-27 16:31:14,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3141920.0, ans=0.125 2023-11-27 16:31:14,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3141920.0, ans=0.05 2023-11-27 16:31:18,093 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471300 2023-11-27 16:31:23,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3141986.6666666665, ans=0.125 2023-11-27 16:31:24,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3141986.6666666665, ans=0.0 2023-11-27 16:31:29,672 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 8.596e+01 9.412e+01 1.004e+02 1.275e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 16:31:32,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3142053.3333333335, ans=0.0 2023-11-27 16:31:36,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3142053.3333333335, ans=0.125 2023-11-27 16:31:43,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3142120.0, ans=0.2 2023-11-27 16:31:47,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3142120.0, ans=0.125 2023-11-27 16:31:52,507 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2400, loss[loss=0.07908, simple_loss=0.1026, pruned_loss=0.01769, audio_tagging_loss=0.01011, over 15542.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08954, pruned_loss=0.01256, audio_tagging_loss=0.008903, over 3037774.93 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:32:04,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3142253.3333333335, ans=0.125 2023-11-27 16:32:10,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3142253.3333333335, ans=0.0 2023-11-27 16:32:15,884 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471350 2023-11-27 16:32:47,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3142453.3333333335, ans=0.125 2023-11-27 16:32:50,644 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2450, loss[loss=0.08544, simple_loss=0.1232, pruned_loss=0.01619, audio_tagging_loss=0.007643, over 15212.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.0901, pruned_loss=0.01265, audio_tagging_loss=0.009021, over 3045439.86 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:32:53,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3142520.0, ans=0.125 2023-11-27 16:32:55,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3142520.0, ans=0.125 2023-11-27 16:33:13,828 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471400 2023-11-27 16:33:15,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3142653.3333333335, ans=0.1 2023-11-27 16:33:25,413 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.627e+01 9.278e+01 9.943e+01 1.246e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-27 16:33:33,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3142720.0, ans=0.125 2023-11-27 16:33:38,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3142786.6666666665, ans=0.2 2023-11-27 16:33:40,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3142786.6666666665, ans=0.125 2023-11-27 16:33:43,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3142786.6666666665, ans=0.0 2023-11-27 16:33:48,568 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2500, loss[loss=0.03712, simple_loss=0.04107, pruned_loss=0.007103, audio_tagging_loss=0.009481, over 13316.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.08926, pruned_loss=0.01262, audio_tagging_loss=0.009038, over 3037424.84 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:33:52,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3142853.3333333335, ans=0.0 2023-11-27 16:34:11,500 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471450 2023-11-27 16:34:25,946 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2023-11-27 16:34:27,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3143053.3333333335, ans=0.125 2023-11-27 16:34:37,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3143120.0, ans=0.125 2023-11-27 16:34:43,385 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=15.0 2023-11-27 16:34:46,507 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2550, loss[loss=0.07082, simple_loss=0.1012, pruned_loss=0.01334, audio_tagging_loss=0.006873, over 14463.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08899, pruned_loss=0.01259, audio_tagging_loss=0.008956, over 3030544.02 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:34:49,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3143186.6666666665, ans=0.125 2023-11-27 16:34:50,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3143186.6666666665, ans=6.0 2023-11-27 16:34:51,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3143186.6666666665, ans=0.125 2023-11-27 16:35:05,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.74 vs. limit=10.0 2023-11-27 16:35:09,344 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471500 2023-11-27 16:35:16,121 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=12.0 2023-11-27 16:35:21,957 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.534e+01 9.124e+01 9.895e+01 1.510e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-27 16:35:44,623 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2600, loss[loss=0.07647, simple_loss=0.104, pruned_loss=0.01695, audio_tagging_loss=0.007537, over 16026.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08949, pruned_loss=0.01253, audio_tagging_loss=0.008848, over 3034107.88 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:35:49,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3143520.0, ans=0.125 2023-11-27 16:36:07,285 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471550 2023-11-27 16:36:07,748 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.95 vs. limit=15.0 2023-11-27 16:36:08,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.31 vs. limit=15.0 2023-11-27 16:36:36,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3143786.6666666665, ans=0.1 2023-11-27 16:36:41,776 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2650, loss[loss=0.07342, simple_loss=0.09731, pruned_loss=0.0137, audio_tagging_loss=0.01107, over 15272.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09018, pruned_loss=0.01262, audio_tagging_loss=0.008704, over 3034590.84 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:36:45,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3143853.3333333335, ans=0.125 2023-11-27 16:37:05,561 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471600 2023-11-27 16:37:05,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3143986.6666666665, ans=0.125 2023-11-27 16:37:16,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3144053.3333333335, ans=0.2 2023-11-27 16:37:18,552 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.734e+01 8.789e+01 9.225e+01 1.011e+02 1.898e+02, threshold=1.845e+02, percent-clipped=1.0 2023-11-27 16:37:28,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3144120.0, ans=0.125 2023-11-27 16:37:39,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3144186.6666666665, ans=0.1 2023-11-27 16:37:41,108 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2700, loss[loss=0.05714, simple_loss=0.0712, pruned_loss=0.01183, audio_tagging_loss=0.009712, over 16483.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09082, pruned_loss=0.01279, audio_tagging_loss=0.008606, over 3037301.21 frames. ], batch size: 64, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:37:43,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3144186.6666666665, ans=0.0 2023-11-27 16:37:44,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3144186.6666666665, ans=0.07 2023-11-27 16:37:54,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3144253.3333333335, ans=0.125 2023-11-27 16:38:00,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3144253.3333333335, ans=0.125 2023-11-27 16:38:03,520 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471650 2023-11-27 16:38:05,803 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:38:11,776 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.87 vs. limit=15.0 2023-11-27 16:38:22,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3144386.6666666665, ans=0.1 2023-11-27 16:38:38,650 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2750, loss[loss=0.08215, simple_loss=0.1131, pruned_loss=0.01723, audio_tagging_loss=0.008367, over 14981.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09041, pruned_loss=0.01272, audio_tagging_loss=0.008624, over 3041172.46 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:39:00,732 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471700 2023-11-27 16:39:14,189 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.918e+01 8.654e+01 9.307e+01 9.925e+01 1.318e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 16:39:25,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3144786.6666666665, ans=0.0 2023-11-27 16:39:31,256 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:39:35,725 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2800, loss[loss=0.08822, simple_loss=0.1221, pruned_loss=0.02045, audio_tagging_loss=0.006698, over 16037.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09016, pruned_loss=0.01267, audio_tagging_loss=0.008654, over 3039748.85 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:39:44,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3144853.3333333335, ans=0.125 2023-11-27 16:39:59,061 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471750 2023-11-27 16:40:02,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3144986.6666666665, ans=0.0 2023-11-27 16:40:33,066 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2850, loss[loss=0.06946, simple_loss=0.08755, pruned_loss=0.01478, audio_tagging_loss=0.0109, over 15507.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08925, pruned_loss=0.01254, audio_tagging_loss=0.008718, over 3039663.61 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:40:36,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3145186.6666666665, ans=0.07 2023-11-27 16:40:41,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3145186.6666666665, ans=0.125 2023-11-27 16:40:56,808 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471800 2023-11-27 16:41:09,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3145386.6666666665, ans=0.2 2023-11-27 16:41:10,171 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.450e+01 8.689e+01 9.342e+01 1.021e+02 1.296e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-27 16:41:31,285 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2900, loss[loss=0.0711, simple_loss=0.1027, pruned_loss=0.01184, audio_tagging_loss=0.007911, over 15948.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09068, pruned_loss=0.01267, audio_tagging_loss=0.008667, over 3036472.05 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:41:31,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3145520.0, ans=0.035 2023-11-27 16:41:33,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3145520.0, ans=0.2 2023-11-27 16:41:46,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3145586.6666666665, ans=0.0 2023-11-27 16:41:46,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3145586.6666666665, ans=0.1 2023-11-27 16:41:52,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3145586.6666666665, ans=0.125 2023-11-27 16:41:54,044 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471850 2023-11-27 16:41:57,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3145653.3333333335, ans=0.125 2023-11-27 16:42:22,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2023-11-27 16:42:26,980 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.86 vs. limit=15.0 2023-11-27 16:42:28,070 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=22.5 2023-11-27 16:42:28,713 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2950, loss[loss=0.07622, simple_loss=0.1109, pruned_loss=0.01224, audio_tagging_loss=0.008539, over 14277.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09056, pruned_loss=0.01276, audio_tagging_loss=0.00874, over 3038399.35 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:42:52,194 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471900 2023-11-27 16:43:05,777 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.312e+01 8.644e+01 9.313e+01 9.896e+01 1.371e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 16:43:22,684 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.50 vs. limit=15.0 2023-11-27 16:43:25,559 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3000, loss[loss=0.06452, simple_loss=0.09359, pruned_loss=0.009112, audio_tagging_loss=0.008613, over 15732.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.09146, pruned_loss=0.01303, audio_tagging_loss=0.008672, over 3044841.46 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:43:25,561 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 16:43:53,021 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0360, 5.8789, 5.6679, 5.6108], device='cuda:0') 2023-11-27 16:43:53,056 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8095, 5.8716, 5.9106, 5.9101], device='cuda:0') 2023-11-27 16:44:00,570 INFO [train_asr.py:1267] (0/4) Epoch 40, validation: loss=0.0576, simple_loss=0.0507, pruned_loss=0.005183, audio_tagging_loss=0.02707, over 4681554.00 frames. 2023-11-27 16:44:00,571 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 16:44:07,687 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.96 vs. limit=15.0 2023-11-27 16:44:17,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3146253.3333333335, ans=0.125 2023-11-27 16:44:22,715 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471950 2023-11-27 16:44:57,628 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3050, loss[loss=0.06301, simple_loss=0.08189, pruned_loss=0.01365, audio_tagging_loss=0.008412, over 15759.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.09122, pruned_loss=0.0129, audio_tagging_loss=0.008789, over 3050482.31 frames. ], batch size: 61, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:45:09,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3146586.6666666665, ans=0.2 2023-11-27 16:45:20,561 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472000 2023-11-27 16:45:20,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3146653.3333333335, ans=0.125 2023-11-27 16:45:21,969 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-472000.pt 2023-11-27 16:45:24,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3146653.3333333335, ans=0.04949747468305833 2023-11-27 16:45:24,603 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.12 vs. limit=22.5 2023-11-27 16:45:37,417 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.960e+01 9.776e+01 1.063e+02 1.311e+02, threshold=1.955e+02, percent-clipped=0.0 2023-11-27 16:45:37,501 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:45:44,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3146720.0, ans=0.1 2023-11-27 16:45:57,826 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3100, loss[loss=0.08489, simple_loss=0.1207, pruned_loss=0.01929, audio_tagging_loss=0.00527, over 15231.00 frames. ], tot_loss[loss=0.06804, simple_loss=0.09224, pruned_loss=0.0131, audio_tagging_loss=0.008814, over 3052026.19 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:46:01,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3146853.3333333335, ans=0.125 2023-11-27 16:46:01,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3146853.3333333335, ans=0.2 2023-11-27 16:46:03,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3146853.3333333335, ans=0.0 2023-11-27 16:46:15,428 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.15 vs. limit=15.0 2023-11-27 16:46:15,745 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-27 16:46:21,467 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472050 2023-11-27 16:46:23,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3146986.6666666665, ans=0.2 2023-11-27 16:46:28,673 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.73 vs. limit=15.0 2023-11-27 16:46:36,104 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2023-11-27 16:46:52,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3147120.0, ans=0.125 2023-11-27 16:46:55,368 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3150, loss[loss=0.061, simple_loss=0.08023, pruned_loss=0.01135, audio_tagging_loss=0.009534, over 15642.00 frames. ], tot_loss[loss=0.06781, simple_loss=0.092, pruned_loss=0.01293, audio_tagging_loss=0.008872, over 3049824.39 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:46:57,262 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:46:58,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3147186.6666666665, ans=0.125 2023-11-27 16:47:01,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3147186.6666666665, ans=0.0 2023-11-27 16:47:18,674 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472100 2023-11-27 16:47:29,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3147386.6666666665, ans=6.0 2023-11-27 16:47:32,434 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.379e+01 8.709e+01 9.238e+01 9.861e+01 1.387e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 16:47:33,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3147386.6666666665, ans=0.0 2023-11-27 16:47:34,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.50 vs. limit=10.0 2023-11-27 16:47:51,886 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:47:52,198 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.05 vs. limit=22.5 2023-11-27 16:47:52,707 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3200, loss[loss=0.06989, simple_loss=0.08831, pruned_loss=0.01572, audio_tagging_loss=0.01001, over 14326.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.09108, pruned_loss=0.01283, audio_tagging_loss=0.00906, over 3052953.27 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:48:15,823 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472150 2023-11-27 16:48:20,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3147653.3333333335, ans=0.1 2023-11-27 16:48:50,357 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3250, loss[loss=0.06629, simple_loss=0.09264, pruned_loss=0.01259, audio_tagging_loss=0.007385, over 15649.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09055, pruned_loss=0.01282, audio_tagging_loss=0.009122, over 3051991.08 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:48:52,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3147853.3333333335, ans=0.125 2023-11-27 16:48:54,234 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2023-11-27 16:48:55,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3147853.3333333335, ans=0.0 2023-11-27 16:49:11,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3147920.0, ans=0.1 2023-11-27 16:49:14,159 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472200 2023-11-27 16:49:28,697 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.334e+01 8.717e+01 9.369e+01 9.960e+01 1.192e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 16:49:43,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3148120.0, ans=0.1 2023-11-27 16:49:48,411 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3300, loss[loss=0.07767, simple_loss=0.1041, pruned_loss=0.01354, audio_tagging_loss=0.01208, over 16809.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.08941, pruned_loss=0.01274, audio_tagging_loss=0.009157, over 3053912.67 frames. ], batch size: 63, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:49:48,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3148186.6666666665, ans=0.2 2023-11-27 16:49:51,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3148186.6666666665, ans=0.0 2023-11-27 16:50:11,736 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472250 2023-11-27 16:50:12,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3148320.0, ans=0.1 2023-11-27 16:50:17,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3148320.0, ans=0.125 2023-11-27 16:50:23,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3148386.6666666665, ans=0.1 2023-11-27 16:50:23,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3148386.6666666665, ans=0.07 2023-11-27 16:50:32,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3148386.6666666665, ans=0.1 2023-11-27 16:50:46,544 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3350, loss[loss=0.05589, simple_loss=0.07516, pruned_loss=0.01016, audio_tagging_loss=0.00815, over 15418.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.08962, pruned_loss=0.01274, audio_tagging_loss=0.009117, over 3054412.12 frames. ], batch size: 61, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:50:51,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3148520.0, ans=0.0 2023-11-27 16:51:09,649 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472300 2023-11-27 16:51:10,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3148653.3333333335, ans=0.125 2023-11-27 16:51:15,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3148653.3333333335, ans=10.0 2023-11-27 16:51:17,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3148653.3333333335, ans=0.1 2023-11-27 16:51:24,405 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.635e+01 9.292e+01 9.708e+01 1.105e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-27 16:51:24,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3148720.0, ans=0.2 2023-11-27 16:51:43,859 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3400, loss[loss=0.06634, simple_loss=0.09684, pruned_loss=0.01231, audio_tagging_loss=0.005603, over 16508.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.08965, pruned_loss=0.01281, audio_tagging_loss=0.00895, over 3049587.76 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:51:46,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3148853.3333333335, ans=0.125 2023-11-27 16:51:52,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3148853.3333333335, ans=0.0 2023-11-27 16:51:56,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3148920.0, ans=0.125 2023-11-27 16:51:56,915 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:52:07,220 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472350 2023-11-27 16:52:27,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3149053.3333333335, ans=0.125 2023-11-27 16:52:33,891 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:52:37,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3149120.0, ans=0.125 2023-11-27 16:52:41,859 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3450, loss[loss=0.05109, simple_loss=0.06464, pruned_loss=0.008173, audio_tagging_loss=0.0106, over 15262.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.08989, pruned_loss=0.01267, audio_tagging_loss=0.008863, over 3047792.17 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:53:05,297 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472400 2023-11-27 16:53:09,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.41 vs. limit=15.0 2023-11-27 16:53:15,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3149386.6666666665, ans=0.2 2023-11-27 16:53:20,463 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.772e+01 9.450e+01 1.013e+02 1.492e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 16:53:23,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3149386.6666666665, ans=0.2 2023-11-27 16:53:30,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3149453.3333333335, ans=0.125 2023-11-27 16:53:39,855 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3500, loss[loss=0.0549, simple_loss=0.07251, pruned_loss=0.009865, audio_tagging_loss=0.008784, over 14806.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09032, pruned_loss=0.01266, audio_tagging_loss=0.008668, over 3050463.41 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:53:53,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3149586.6666666665, ans=0.07 2023-11-27 16:53:57,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3149586.6666666665, ans=0.0 2023-11-27 16:54:03,469 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472450 2023-11-27 16:54:13,266 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:54:37,440 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3550, loss[loss=0.05319, simple_loss=0.07451, pruned_loss=0.006652, audio_tagging_loss=0.009282, over 13678.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09015, pruned_loss=0.01244, audio_tagging_loss=0.00862, over 3049710.06 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 16:54:58,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3149920.0, ans=0.035 2023-11-27 16:55:00,569 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472500 2023-11-27 16:55:09,198 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2023-11-27 16:55:15,801 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.37 vs. limit=15.0 2023-11-27 16:55:15,972 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.604e+01 9.006e+01 9.735e+01 1.232e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-27 16:55:35,510 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3600, loss[loss=0.05533, simple_loss=0.07021, pruned_loss=0.01173, audio_tagging_loss=0.008499, over 15388.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08936, pruned_loss=0.01222, audio_tagging_loss=0.008658, over 3049992.65 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 16:55:44,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3150186.6666666665, ans=0.1 2023-11-27 16:55:44,541 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:55:54,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3150253.3333333335, ans=0.125 2023-11-27 16:55:54,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3150253.3333333335, ans=0.2 2023-11-27 16:55:58,162 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472550 2023-11-27 16:55:58,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-11-27 16:56:17,847 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2023-11-27 16:56:19,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3150386.6666666665, ans=0.125 2023-11-27 16:56:33,329 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3650, loss[loss=0.0559, simple_loss=0.07518, pruned_loss=0.009963, audio_tagging_loss=0.008348, over 14578.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08987, pruned_loss=0.01235, audio_tagging_loss=0.008579, over 3049671.97 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 16:56:49,310 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.67 vs. limit=15.0 2023-11-27 16:56:56,740 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472600 2023-11-27 16:57:11,659 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.156e+01 8.799e+01 9.366e+01 9.854e+01 1.150e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-27 16:57:13,402 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.09 vs. limit=15.0 2023-11-27 16:57:17,207 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.96 vs. limit=22.5 2023-11-27 16:57:23,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3150786.6666666665, ans=0.0 2023-11-27 16:57:25,942 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.86 vs. limit=15.0 2023-11-27 16:57:28,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3150786.6666666665, ans=0.04949747468305833 2023-11-27 16:57:30,578 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3700, loss[loss=0.0691, simple_loss=0.09935, pruned_loss=0.01111, audio_tagging_loss=0.00831, over 15866.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08951, pruned_loss=0.01222, audio_tagging_loss=0.008544, over 3048806.98 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 16:57:44,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3150920.0, ans=0.2 2023-11-27 16:57:53,930 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472650 2023-11-27 16:57:58,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3150986.6666666665, ans=0.125 2023-11-27 16:58:03,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3150986.6666666665, ans=0.2 2023-11-27 16:58:09,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3151053.3333333335, ans=0.1 2023-11-27 16:58:10,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3151053.3333333335, ans=0.0 2023-11-27 16:58:20,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3151120.0, ans=0.125 2023-11-27 16:58:28,657 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3750, loss[loss=0.07151, simple_loss=0.09995, pruned_loss=0.01378, audio_tagging_loss=0.00776, over 15739.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09022, pruned_loss=0.01249, audio_tagging_loss=0.008589, over 3050629.41 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 16:58:51,255 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472700 2023-11-27 16:58:58,348 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.67 vs. limit=15.0 2023-11-27 16:59:07,646 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.433e+01 8.997e+01 9.607e+01 1.030e+02 1.522e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 16:59:12,472 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:59:26,483 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3800, loss[loss=0.07002, simple_loss=0.09738, pruned_loss=0.01347, audio_tagging_loss=0.007864, over 16848.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09059, pruned_loss=0.01262, audio_tagging_loss=0.00859, over 3053759.75 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 16:59:30,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3151520.0, ans=0.0 2023-11-27 16:59:34,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3151520.0, ans=0.0 2023-11-27 16:59:49,613 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472750 2023-11-27 17:00:14,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3151786.6666666665, ans=0.125 2023-11-27 17:00:23,108 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3850, loss[loss=0.058, simple_loss=0.07518, pruned_loss=0.009608, audio_tagging_loss=0.0108, over 14759.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09055, pruned_loss=0.01267, audio_tagging_loss=0.00867, over 3054132.19 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:00:28,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3151853.3333333335, ans=0.0 2023-11-27 17:00:36,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3151920.0, ans=0.125 2023-11-27 17:00:46,432 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472800 2023-11-27 17:00:49,408 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.44 vs. limit=15.0 2023-11-27 17:00:54,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3151986.6666666665, ans=0.125 2023-11-27 17:01:02,648 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.917e+01 9.426e+01 9.996e+01 1.241e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-27 17:01:06,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3152053.3333333335, ans=0.0 2023-11-27 17:01:07,330 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:01:20,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3152186.6666666665, ans=0.125 2023-11-27 17:01:21,848 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3900, loss[loss=0.06842, simple_loss=0.09223, pruned_loss=0.01354, audio_tagging_loss=0.008761, over 15375.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09017, pruned_loss=0.01259, audio_tagging_loss=0.008802, over 3042220.52 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:01:22,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3152186.6666666665, ans=0.125 2023-11-27 17:01:23,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3152186.6666666665, ans=0.125 2023-11-27 17:01:44,298 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472850 2023-11-27 17:01:58,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3152386.6666666665, ans=0.0 2023-11-27 17:02:12,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3152453.3333333335, ans=0.95 2023-11-27 17:02:12,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3152453.3333333335, ans=0.5 2023-11-27 17:02:18,567 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.42 vs. limit=15.0 2023-11-27 17:02:18,846 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3950, loss[loss=0.07882, simple_loss=0.1127, pruned_loss=0.0134, audio_tagging_loss=0.009062, over 15218.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09003, pruned_loss=0.01261, audio_tagging_loss=0.008851, over 3037570.58 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:02:26,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3152520.0, ans=0.125 2023-11-27 17:02:29,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3152586.6666666665, ans=0.1 2023-11-27 17:02:41,808 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472900 2023-11-27 17:02:49,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3152653.3333333335, ans=0.125 2023-11-27 17:02:50,913 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.34 vs. limit=15.0 2023-11-27 17:02:58,014 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.297e+01 8.787e+01 9.336e+01 1.021e+02 1.304e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 17:03:03,016 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=12.0 2023-11-27 17:03:16,290 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4000, loss[loss=0.06564, simple_loss=0.09271, pruned_loss=0.01246, audio_tagging_loss=0.006824, over 14658.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08955, pruned_loss=0.0124, audio_tagging_loss=0.008957, over 3039781.11 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:03:30,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3152920.0, ans=0.125 2023-11-27 17:03:39,909 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472950 2023-11-27 17:03:49,923 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.80 vs. limit=12.0 2023-11-27 17:03:56,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3153053.3333333335, ans=0.1 2023-11-27 17:03:59,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3153053.3333333335, ans=0.1 2023-11-27 17:04:01,135 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-27 17:04:10,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3153120.0, ans=0.2 2023-11-27 17:04:11,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3153120.0, ans=0.125 2023-11-27 17:04:13,844 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4050, loss[loss=0.06959, simple_loss=0.09712, pruned_loss=0.01158, audio_tagging_loss=0.009447, over 14975.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09018, pruned_loss=0.01239, audio_tagging_loss=0.008953, over 3047000.13 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:04:21,554 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 17:04:30,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3153253.3333333335, ans=0.125 2023-11-27 17:04:33,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3153253.3333333335, ans=0.0 2023-11-27 17:04:37,471 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473000 2023-11-27 17:04:40,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3153320.0, ans=0.125 2023-11-27 17:04:47,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3153386.6666666665, ans=0.125 2023-11-27 17:04:53,031 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.948e+01 9.021e+01 9.575e+01 1.043e+02 1.402e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 17:04:53,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3153386.6666666665, ans=0.1 2023-11-27 17:04:54,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3153386.6666666665, ans=0.0 2023-11-27 17:05:12,225 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4100, loss[loss=0.05729, simple_loss=0.07177, pruned_loss=0.01166, audio_tagging_loss=0.009748, over 15882.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09093, pruned_loss=0.01269, audio_tagging_loss=0.008993, over 3046290.02 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:05:20,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3153520.0, ans=0.125 2023-11-27 17:05:23,440 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.79 vs. limit=15.0 2023-11-27 17:05:32,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3153586.6666666665, ans=0.125 2023-11-27 17:05:34,801 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473050 2023-11-27 17:06:07,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3153786.6666666665, ans=0.0 2023-11-27 17:06:09,986 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4150, loss[loss=0.07834, simple_loss=0.1096, pruned_loss=0.01701, audio_tagging_loss=0.006528, over 15272.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09063, pruned_loss=0.0126, audio_tagging_loss=0.008919, over 3046950.09 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:06:11,514 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=15.0 2023-11-27 17:06:32,990 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473100 2023-11-27 17:06:42,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3153986.6666666665, ans=0.025 2023-11-27 17:06:45,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2023-11-27 17:06:47,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3154053.3333333335, ans=0.125 2023-11-27 17:06:48,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3154053.3333333335, ans=0.125 2023-11-27 17:06:50,626 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.636e+01 9.410e+01 1.013e+02 1.260e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 17:06:54,562 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2023-11-27 17:06:55,205 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 17:06:57,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3154120.0, ans=0.0 2023-11-27 17:06:59,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3154120.0, ans=0.125 2023-11-27 17:07:04,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3154120.0, ans=0.0 2023-11-27 17:07:07,801 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4200, loss[loss=0.09013, simple_loss=0.1254, pruned_loss=0.02067, audio_tagging_loss=0.006755, over 14989.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09103, pruned_loss=0.01276, audio_tagging_loss=0.008793, over 3048639.87 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:07:10,346 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=12.0 2023-11-27 17:07:12,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3154186.6666666665, ans=0.125 2023-11-27 17:07:16,782 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.21 vs. limit=10.0 2023-11-27 17:07:22,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3154253.3333333335, ans=0.0 2023-11-27 17:07:31,570 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473150 2023-11-27 17:07:34,064 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:07:40,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3154320.0, ans=0.1 2023-11-27 17:08:05,638 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4250, loss[loss=0.06696, simple_loss=0.09332, pruned_loss=0.01106, audio_tagging_loss=0.009241, over 15562.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.0915, pruned_loss=0.01281, audio_tagging_loss=0.008697, over 3043394.78 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:08:06,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3154520.0, ans=0.125 2023-11-27 17:08:18,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3154586.6666666665, ans=0.125 2023-11-27 17:08:20,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3154586.6666666665, ans=0.0 2023-11-27 17:08:28,882 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473200 2023-11-27 17:08:46,607 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.728e+01 9.330e+01 9.912e+01 1.216e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-27 17:09:04,288 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4300, loss[loss=0.06362, simple_loss=0.08639, pruned_loss=0.01114, audio_tagging_loss=0.009283, over 15835.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.0909, pruned_loss=0.01273, audio_tagging_loss=0.00871, over 3046018.59 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:09:27,294 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473250 2023-11-27 17:09:31,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3154986.6666666665, ans=0.1 2023-11-27 17:09:40,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3155053.3333333335, ans=0.1 2023-11-27 17:09:53,609 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.00 vs. limit=22.5 2023-11-27 17:09:54,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3155120.0, ans=0.125 2023-11-27 17:09:58,878 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=15.0 2023-11-27 17:10:00,574 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4350, loss[loss=0.05495, simple_loss=0.06574, pruned_loss=0.009829, audio_tagging_loss=0.01225, over 14940.00 frames. ], tot_loss[loss=0.06769, simple_loss=0.09205, pruned_loss=0.01301, audio_tagging_loss=0.008659, over 3046289.60 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:10:19,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3155253.3333333335, ans=0.0 2023-11-27 17:10:19,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3155253.3333333335, ans=0.125 2023-11-27 17:10:23,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3155320.0, ans=10.0 2023-11-27 17:10:24,469 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473300 2023-11-27 17:10:28,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3155320.0, ans=0.125 2023-11-27 17:10:41,001 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.765e+01 9.493e+01 1.043e+02 1.484e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-27 17:10:55,862 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2023-11-27 17:10:58,647 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4400, loss[loss=0.05289, simple_loss=0.06638, pruned_loss=0.01106, audio_tagging_loss=0.008642, over 14923.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.09148, pruned_loss=0.01286, audio_tagging_loss=0.008657, over 3041362.13 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:11:20,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3155653.3333333335, ans=0.1 2023-11-27 17:11:21,904 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473350 2023-11-27 17:11:49,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3155786.6666666665, ans=0.125 2023-11-27 17:11:56,765 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.73 vs. limit=12.0 2023-11-27 17:11:57,081 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4450, loss[loss=0.05163, simple_loss=0.06652, pruned_loss=0.006756, audio_tagging_loss=0.01161, over 14407.00 frames. ], tot_loss[loss=0.06768, simple_loss=0.09201, pruned_loss=0.01301, audio_tagging_loss=0.008669, over 3043180.25 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:12:09,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3155920.0, ans=0.125 2023-11-27 17:12:11,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3155920.0, ans=0.125 2023-11-27 17:12:19,388 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473400 2023-11-27 17:12:28,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3155986.6666666665, ans=0.1 2023-11-27 17:12:30,569 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.35 vs. limit=15.0 2023-11-27 17:12:37,804 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.71 vs. limit=10.0 2023-11-27 17:12:38,301 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.039e+01 8.835e+01 9.403e+01 1.018e+02 2.786e+02, threshold=1.881e+02, percent-clipped=1.0 2023-11-27 17:12:54,381 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4500, loss[loss=0.05769, simple_loss=0.06853, pruned_loss=0.01226, audio_tagging_loss=0.01116, over 14942.00 frames. ], tot_loss[loss=0.06774, simple_loss=0.09214, pruned_loss=0.01309, audio_tagging_loss=0.008577, over 3050558.81 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:13:11,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3156253.3333333335, ans=0.125 2023-11-27 17:13:15,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3156253.3333333335, ans=0.2 2023-11-27 17:13:17,185 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473450 2023-11-27 17:13:21,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3156320.0, ans=0.07 2023-11-27 17:13:30,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3156386.6666666665, ans=0.125 2023-11-27 17:13:52,325 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4550, loss[loss=0.06941, simple_loss=0.08818, pruned_loss=0.01332, audio_tagging_loss=0.012, over 14607.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09185, pruned_loss=0.013, audio_tagging_loss=0.008521, over 3043710.37 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:14:06,878 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2023-11-27 17:14:07,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3156586.6666666665, ans=0.1 2023-11-27 17:14:11,700 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.91 vs. limit=15.0 2023-11-27 17:14:15,592 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473500 2023-11-27 17:14:33,781 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.569e+01 9.256e+01 9.932e+01 4.356e+02, threshold=1.851e+02, percent-clipped=1.0 2023-11-27 17:14:34,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3156720.0, ans=0.125 2023-11-27 17:14:39,263 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 17:14:49,598 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4600, loss[loss=0.07263, simple_loss=0.09826, pruned_loss=0.01675, audio_tagging_loss=0.006755, over 15695.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09134, pruned_loss=0.01286, audio_tagging_loss=0.008559, over 3043228.12 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:14:59,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3156853.3333333335, ans=0.0 2023-11-27 17:15:00,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3156920.0, ans=0.0 2023-11-27 17:15:04,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3156920.0, ans=0.0 2023-11-27 17:15:12,816 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473550 2023-11-27 17:15:20,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3156986.6666666665, ans=0.125 2023-11-27 17:15:31,676 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:15:38,521 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.41 vs. limit=22.5 2023-11-27 17:15:44,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3157120.0, ans=0.0 2023-11-27 17:15:47,489 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4650, loss[loss=0.06307, simple_loss=0.08075, pruned_loss=0.01351, audio_tagging_loss=0.009189, over 14615.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09061, pruned_loss=0.01269, audio_tagging_loss=0.008727, over 3047343.01 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:15:52,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3157186.6666666665, ans=0.0 2023-11-27 17:15:55,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3157186.6666666665, ans=0.125 2023-11-27 17:15:56,929 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:16:10,315 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473600 2023-11-27 17:16:20,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3157320.0, ans=0.1 2023-11-27 17:16:27,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3157386.6666666665, ans=0.2 2023-11-27 17:16:28,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3157386.6666666665, ans=0.5 2023-11-27 17:16:29,573 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.658e+01 8.758e+01 9.328e+01 1.030e+02 1.229e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-27 17:16:30,309 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2023-11-27 17:16:35,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3157453.3333333335, ans=0.125 2023-11-27 17:16:42,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3157453.3333333335, ans=0.125 2023-11-27 17:16:43,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=3157453.3333333335, ans=0.1 2023-11-27 17:16:45,827 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4700, loss[loss=0.0815, simple_loss=0.118, pruned_loss=0.01648, audio_tagging_loss=0.00603, over 15226.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.0903, pruned_loss=0.01281, audio_tagging_loss=0.008844, over 3042364.27 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:16:46,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3157520.0, ans=0.0 2023-11-27 17:16:50,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3157520.0, ans=0.0 2023-11-27 17:17:05,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3157586.6666666665, ans=0.0 2023-11-27 17:17:08,408 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473650 2023-11-27 17:17:09,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3157653.3333333335, ans=0.125 2023-11-27 17:17:16,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3157653.3333333335, ans=0.125 2023-11-27 17:17:28,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3157720.0, ans=0.1 2023-11-27 17:17:33,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.06 vs. limit=15.0 2023-11-27 17:17:43,358 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4750, loss[loss=0.07087, simple_loss=0.09716, pruned_loss=0.0122, audio_tagging_loss=0.01009, over 15765.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09057, pruned_loss=0.01286, audio_tagging_loss=0.008914, over 3040898.02 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:17:44,182 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.20 vs. limit=15.0 2023-11-27 17:17:49,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3157853.3333333335, ans=0.125 2023-11-27 17:18:03,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-11-27 17:18:06,428 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473700 2023-11-27 17:18:24,342 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.634e+01 8.859e+01 9.575e+01 1.045e+02 1.210e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 17:18:25,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3158053.3333333335, ans=0.1 2023-11-27 17:18:27,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3158053.3333333335, ans=0.0 2023-11-27 17:18:40,206 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4800, loss[loss=0.06712, simple_loss=0.09422, pruned_loss=0.01042, audio_tagging_loss=0.009591, over 15054.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09077, pruned_loss=0.01284, audio_tagging_loss=0.008955, over 3041844.82 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:18:45,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3158186.6666666665, ans=0.125 2023-11-27 17:19:02,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.68 vs. limit=15.0 2023-11-27 17:19:03,933 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473750 2023-11-27 17:19:06,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3158320.0, ans=0.125 2023-11-27 17:19:24,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3158386.6666666665, ans=0.2 2023-11-27 17:19:36,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3158453.3333333335, ans=0.125 2023-11-27 17:19:38,166 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4850, loss[loss=0.07256, simple_loss=0.1009, pruned_loss=0.01234, audio_tagging_loss=0.009745, over 14325.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.08977, pruned_loss=0.01254, audio_tagging_loss=0.00915, over 3042890.04 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:19:38,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3158520.0, ans=0.125 2023-11-27 17:19:47,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3158520.0, ans=0.125 2023-11-27 17:19:48,015 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.67 vs. limit=15.0 2023-11-27 17:19:57,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3158586.6666666665, ans=0.0 2023-11-27 17:20:01,618 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473800 2023-11-27 17:20:01,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3158653.3333333335, ans=0.1 2023-11-27 17:20:07,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3158653.3333333335, ans=0.1 2023-11-27 17:20:17,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3158720.0, ans=0.2 2023-11-27 17:20:20,709 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.605e+01 8.680e+01 9.364e+01 9.927e+01 1.620e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-27 17:20:22,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3158720.0, ans=0.125 2023-11-27 17:20:28,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3158786.6666666665, ans=0.1 2023-11-27 17:20:30,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3158786.6666666665, ans=0.125 2023-11-27 17:20:36,676 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4900, loss[loss=0.08034, simple_loss=0.1069, pruned_loss=0.01917, audio_tagging_loss=0.007699, over 15429.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09095, pruned_loss=0.01272, audio_tagging_loss=0.00896, over 3048790.65 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:21:00,031 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473850 2023-11-27 17:21:02,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2023-11-27 17:21:03,241 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.09 vs. limit=6.0 2023-11-27 17:21:03,244 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2023-11-27 17:21:08,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3158986.6666666665, ans=0.0 2023-11-27 17:21:34,307 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4950, loss[loss=0.05094, simple_loss=0.0623, pruned_loss=0.006845, audio_tagging_loss=0.01294, over 16762.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.0904, pruned_loss=0.01258, audio_tagging_loss=0.008895, over 3041764.72 frames. ], batch size: 63, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:21:40,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3159186.6666666665, ans=0.125 2023-11-27 17:21:53,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3159253.3333333335, ans=0.07 2023-11-27 17:21:57,390 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473900 2023-11-27 17:21:57,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3159320.0, ans=0.125 2023-11-27 17:22:07,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3159386.6666666665, ans=0.125 2023-11-27 17:22:16,608 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.075e+01 8.677e+01 9.528e+01 1.024e+02 1.553e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-27 17:22:16,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3159386.6666666665, ans=0.1 2023-11-27 17:22:31,932 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5000, loss[loss=0.06036, simple_loss=0.07956, pruned_loss=0.01412, audio_tagging_loss=0.006453, over 15042.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09083, pruned_loss=0.01261, audio_tagging_loss=0.008712, over 3045475.05 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:22:36,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3159520.0, ans=0.0 2023-11-27 17:22:41,771 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.31 vs. limit=12.0 2023-11-27 17:22:55,068 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473950 2023-11-27 17:22:59,641 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:23:01,182 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.78 vs. limit=6.0 2023-11-27 17:23:10,766 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.67 vs. limit=22.5 2023-11-27 17:23:12,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3159720.0, ans=0.125 2023-11-27 17:23:24,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3159786.6666666665, ans=0.0 2023-11-27 17:23:29,487 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5050, loss[loss=0.05637, simple_loss=0.08137, pruned_loss=0.008087, audio_tagging_loss=0.007596, over 15148.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09104, pruned_loss=0.01265, audio_tagging_loss=0.008673, over 3047476.48 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:23:36,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3159853.3333333335, ans=0.1 2023-11-27 17:23:52,211 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474000 2023-11-27 17:23:54,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3159986.6666666665, ans=0.05 2023-11-27 17:24:12,648 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.792e+01 8.599e+01 9.260e+01 9.891e+01 1.238e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-27 17:24:13,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3160053.3333333335, ans=0.125 2023-11-27 17:24:18,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3160120.0, ans=10.0 2023-11-27 17:24:20,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3160120.0, ans=0.125 2023-11-27 17:24:27,712 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5100, loss[loss=0.06448, simple_loss=0.08617, pruned_loss=0.009807, audio_tagging_loss=0.01159, over 14819.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09142, pruned_loss=0.0126, audio_tagging_loss=0.00866, over 3051333.91 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:24:27,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3160186.6666666665, ans=0.125 2023-11-27 17:24:34,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3160186.6666666665, ans=0.95 2023-11-27 17:24:51,285 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474050 2023-11-27 17:25:17,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3160453.3333333335, ans=0.1 2023-11-27 17:25:21,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3160453.3333333335, ans=0.125 2023-11-27 17:25:24,997 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5150, loss[loss=0.06654, simple_loss=0.09586, pruned_loss=0.01113, audio_tagging_loss=0.00747, over 15734.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09161, pruned_loss=0.01278, audio_tagging_loss=0.008576, over 3054869.09 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:25:48,625 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474100 2023-11-27 17:25:53,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3160653.3333333335, ans=0.2 2023-11-27 17:26:07,193 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.651e+01 9.333e+01 9.963e+01 1.109e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 17:26:22,472 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5200, loss[loss=0.05547, simple_loss=0.06407, pruned_loss=0.01182, audio_tagging_loss=0.01161, over 14815.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09153, pruned_loss=0.01273, audio_tagging_loss=0.008616, over 3054265.06 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:26:37,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3160920.0, ans=0.2 2023-11-27 17:26:45,129 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474150 2023-11-27 17:26:47,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3160986.6666666665, ans=0.125 2023-11-27 17:26:52,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.45 vs. limit=12.0 2023-11-27 17:26:54,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3160986.6666666665, ans=0.1 2023-11-27 17:27:10,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3161120.0, ans=0.1 2023-11-27 17:27:20,055 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5250, loss[loss=0.06971, simple_loss=0.09555, pruned_loss=0.01132, audio_tagging_loss=0.01061, over 15894.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.0916, pruned_loss=0.01272, audio_tagging_loss=0.008574, over 3050139.14 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:27:26,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3161186.6666666665, ans=0.0 2023-11-27 17:27:33,816 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.74 vs. limit=15.0 2023-11-27 17:27:37,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3161253.3333333335, ans=0.2 2023-11-27 17:27:42,546 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474200 2023-11-27 17:27:48,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3161320.0, ans=0.0 2023-11-27 17:28:02,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3161386.6666666665, ans=0.2 2023-11-27 17:28:03,022 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.456e+01 8.718e+01 9.401e+01 1.041e+02 1.435e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-27 17:28:09,692 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:28:17,163 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5300, loss[loss=0.05441, simple_loss=0.07615, pruned_loss=0.01052, audio_tagging_loss=0.005814, over 14199.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09175, pruned_loss=0.01282, audio_tagging_loss=0.008533, over 3048570.97 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:28:25,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3161520.0, ans=0.05 2023-11-27 17:28:26,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3161520.0, ans=0.2 2023-11-27 17:28:32,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3161586.6666666665, ans=0.125 2023-11-27 17:28:40,943 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474250 2023-11-27 17:29:02,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.42 vs. limit=22.5 2023-11-27 17:29:06,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3161786.6666666665, ans=0.0 2023-11-27 17:29:14,712 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5350, loss[loss=0.07471, simple_loss=0.1084, pruned_loss=0.01315, audio_tagging_loss=0.007371, over 15720.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09125, pruned_loss=0.01263, audio_tagging_loss=0.008605, over 3053310.43 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:29:17,303 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2023-11-27 17:29:29,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3161920.0, ans=0.1 2023-11-27 17:29:38,013 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474300 2023-11-27 17:29:43,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3161986.6666666665, ans=0.0 2023-11-27 17:29:46,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3161986.6666666665, ans=0.2 2023-11-27 17:29:47,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3161986.6666666665, ans=0.5 2023-11-27 17:29:48,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3162053.3333333335, ans=0.0 2023-11-27 17:29:57,551 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.768e+01 8.549e+01 9.139e+01 9.970e+01 1.797e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-27 17:29:59,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3162053.3333333335, ans=0.07 2023-11-27 17:30:08,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3162120.0, ans=0.1 2023-11-27 17:30:13,035 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5400, loss[loss=0.0799, simple_loss=0.1124, pruned_loss=0.01554, audio_tagging_loss=0.008167, over 16153.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09134, pruned_loss=0.01272, audio_tagging_loss=0.00866, over 3046897.94 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:30:27,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3162253.3333333335, ans=0.125 2023-11-27 17:30:35,253 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474350 2023-11-27 17:31:09,435 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5450, loss[loss=0.05044, simple_loss=0.05924, pruned_loss=0.008715, audio_tagging_loss=0.0121, over 15639.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09179, pruned_loss=0.01279, audio_tagging_loss=0.008656, over 3048043.63 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:31:18,930 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:31:33,084 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474400 2023-11-27 17:31:36,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3162653.3333333335, ans=0.125 2023-11-27 17:31:38,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3162653.3333333335, ans=0.125 2023-11-27 17:31:43,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3162720.0, ans=0.125 2023-11-27 17:31:53,362 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.734e+01 9.322e+01 1.014e+02 1.420e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-27 17:32:07,534 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5500, loss[loss=0.04925, simple_loss=0.06521, pruned_loss=0.008621, audio_tagging_loss=0.008027, over 14740.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09133, pruned_loss=0.01277, audio_tagging_loss=0.008657, over 3049512.63 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:32:09,310 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.99 vs. limit=10.0 2023-11-27 17:32:19,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3162920.0, ans=0.1 2023-11-27 17:32:19,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3162920.0, ans=0.0 2023-11-27 17:32:27,908 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=12.0 2023-11-27 17:32:30,739 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474450 2023-11-27 17:32:33,118 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:32:35,644 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.42 vs. limit=15.0 2023-11-27 17:32:39,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3162986.6666666665, ans=0.125 2023-11-27 17:33:04,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3163186.6666666665, ans=0.125 2023-11-27 17:33:05,368 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5550, loss[loss=0.07421, simple_loss=0.1077, pruned_loss=0.01109, audio_tagging_loss=0.009287, over 14467.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09056, pruned_loss=0.01264, audio_tagging_loss=0.008864, over 3047087.40 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:33:12,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3163186.6666666665, ans=0.09899494936611666 2023-11-27 17:33:13,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3163186.6666666665, ans=0.1 2023-11-27 17:33:27,875 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474500 2023-11-27 17:33:34,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3163320.0, ans=0.1 2023-11-27 17:33:49,147 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.390e+01 8.657e+01 9.312e+01 9.840e+01 1.170e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-27 17:34:02,489 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5600, loss[loss=0.08055, simple_loss=0.1174, pruned_loss=0.01554, audio_tagging_loss=0.00629, over 16167.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09079, pruned_loss=0.01266, audio_tagging_loss=0.008876, over 3045559.48 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:34:02,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3163520.0, ans=0.0 2023-11-27 17:34:08,426 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-11-27 17:34:25,504 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474550 2023-11-27 17:34:32,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3163653.3333333335, ans=0.125 2023-11-27 17:34:40,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3163720.0, ans=0.125 2023-11-27 17:34:47,602 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 17:34:59,996 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5650, loss[loss=0.06823, simple_loss=0.09203, pruned_loss=0.01408, audio_tagging_loss=0.008132, over 15973.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09122, pruned_loss=0.01267, audio_tagging_loss=0.00894, over 3055537.94 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:35:01,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3163853.3333333335, ans=0.125 2023-11-27 17:35:05,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3163853.3333333335, ans=0.1 2023-11-27 17:35:08,662 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2023-11-27 17:35:16,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3163920.0, ans=0.125 2023-11-27 17:35:17,275 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.15 vs. limit=15.0 2023-11-27 17:35:23,119 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.07 vs. limit=15.0 2023-11-27 17:35:23,700 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474600 2023-11-27 17:35:24,086 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.58 vs. limit=15.0 2023-11-27 17:35:31,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3163986.6666666665, ans=0.125 2023-11-27 17:35:36,563 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.69 vs. limit=15.0 2023-11-27 17:35:45,043 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.769e+01 8.679e+01 9.217e+01 1.003e+02 1.541e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-27 17:35:58,325 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5700, loss[loss=0.08149, simple_loss=0.117, pruned_loss=0.01948, audio_tagging_loss=0.003518, over 14462.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.09117, pruned_loss=0.01269, audio_tagging_loss=0.008975, over 3047431.77 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:36:06,059 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.38 vs. limit=12.0 2023-11-27 17:36:09,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3164253.3333333335, ans=0.125 2023-11-27 17:36:10,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3164253.3333333335, ans=0.0 2023-11-27 17:36:11,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3164253.3333333335, ans=0.125 2023-11-27 17:36:20,624 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474650 2023-11-27 17:36:25,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3164320.0, ans=0.2 2023-11-27 17:36:27,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3164320.0, ans=0.125 2023-11-27 17:36:35,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3164386.6666666665, ans=10.0 2023-11-27 17:36:55,055 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5750, loss[loss=0.06898, simple_loss=0.08855, pruned_loss=0.01591, audio_tagging_loss=0.008796, over 14711.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09055, pruned_loss=0.01275, audio_tagging_loss=0.008993, over 3037443.25 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:37:12,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3164586.6666666665, ans=0.125 2023-11-27 17:37:18,042 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474700 2023-11-27 17:37:19,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3164653.3333333335, ans=0.125 2023-11-27 17:37:31,236 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.89 vs. limit=22.5 2023-11-27 17:37:38,266 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.20 vs. limit=22.5 2023-11-27 17:37:39,970 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.234e+01 8.634e+01 9.303e+01 1.008e+02 1.326e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 17:37:43,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3164786.6666666665, ans=0.0 2023-11-27 17:37:50,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3164786.6666666665, ans=0.125 2023-11-27 17:37:51,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3164853.3333333335, ans=0.125 2023-11-27 17:37:52,573 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5800, loss[loss=0.07986, simple_loss=0.1093, pruned_loss=0.01899, audio_tagging_loss=0.00623, over 15658.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09056, pruned_loss=0.01281, audio_tagging_loss=0.008828, over 3037711.17 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:38:09,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3164920.0, ans=0.125 2023-11-27 17:38:15,604 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474750 2023-11-27 17:38:31,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3165053.3333333335, ans=0.1 2023-11-27 17:38:49,885 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5850, loss[loss=0.07143, simple_loss=0.101, pruned_loss=0.01244, audio_tagging_loss=0.008483, over 16478.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09056, pruned_loss=0.01265, audio_tagging_loss=0.008799, over 3037147.89 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:38:51,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3165186.6666666665, ans=0.1 2023-11-27 17:38:57,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3165186.6666666665, ans=0.1 2023-11-27 17:38:57,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3165186.6666666665, ans=0.2 2023-11-27 17:39:13,017 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474800 2023-11-27 17:39:16,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3165320.0, ans=0.2 2023-11-27 17:39:31,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3165386.6666666665, ans=0.125 2023-11-27 17:39:35,139 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.752e+01 8.698e+01 9.361e+01 9.946e+01 1.172e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-27 17:39:40,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3165453.3333333335, ans=0.2 2023-11-27 17:39:48,452 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5900, loss[loss=0.06504, simple_loss=0.08887, pruned_loss=0.01289, audio_tagging_loss=0.007714, over 15259.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.0906, pruned_loss=0.01266, audio_tagging_loss=0.008726, over 3041826.78 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:40:08,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3165586.6666666665, ans=0.0 2023-11-27 17:40:11,585 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474850 2023-11-27 17:40:13,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=3165653.3333333335, ans=0.02 2023-11-27 17:40:15,396 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.82 vs. limit=22.5 2023-11-27 17:40:46,220 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5950, loss[loss=0.07058, simple_loss=0.1008, pruned_loss=0.0123, audio_tagging_loss=0.00788, over 14919.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09144, pruned_loss=0.01274, audio_tagging_loss=0.008613, over 3056752.54 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:40:52,174 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.43 vs. limit=22.5 2023-11-27 17:40:55,985 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=15.0 2023-11-27 17:40:57,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3165920.0, ans=0.125 2023-11-27 17:41:03,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3165920.0, ans=0.125 2023-11-27 17:41:09,202 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474900 2023-11-27 17:41:30,962 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 8.669e+01 9.306e+01 1.020e+02 1.374e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 17:41:31,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3166120.0, ans=0.0 2023-11-27 17:41:41,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3166120.0, ans=0.125 2023-11-27 17:41:43,436 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6000, loss[loss=0.03886, simple_loss=0.04971, pruned_loss=0.006315, audio_tagging_loss=0.007691, over 14076.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09162, pruned_loss=0.01275, audio_tagging_loss=0.008586, over 3045958.74 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:41:43,439 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 17:42:06,417 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4600, 3.7773, 4.3484, 3.5146], device='cuda:0') 2023-11-27 17:42:11,558 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1006, 2.4461, 4.9565, 2.9926], device='cuda:0') 2023-11-27 17:42:18,064 INFO [train_asr.py:1267] (0/4) Epoch 40, validation: loss=0.05751, simple_loss=0.05064, pruned_loss=0.005151, audio_tagging_loss=0.02703, over 4681554.00 frames. 2023-11-27 17:42:18,064 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 17:42:31,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3166253.3333333335, ans=0.125 2023-11-27 17:42:40,807 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474950 2023-11-27 17:42:52,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3166386.6666666665, ans=0.125 2023-11-27 17:43:02,566 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 17:43:02,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3166453.3333333335, ans=0.0 2023-11-27 17:43:07,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3166453.3333333335, ans=0.125 2023-11-27 17:43:14,951 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6050, loss[loss=0.07705, simple_loss=0.09964, pruned_loss=0.01727, audio_tagging_loss=0.009957, over 14598.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09188, pruned_loss=0.01282, audio_tagging_loss=0.008617, over 3047391.68 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:43:32,471 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.44 vs. limit=15.0 2023-11-27 17:43:38,182 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475000 2023-11-27 17:43:51,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3166720.0, ans=0.2 2023-11-27 17:43:51,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3166720.0, ans=0.0 2023-11-27 17:43:53,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3166720.0, ans=0.0 2023-11-27 17:44:01,699 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.500e+01 9.097e+01 9.950e+01 1.272e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-27 17:44:12,685 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6100, loss[loss=0.07387, simple_loss=0.1022, pruned_loss=0.0158, audio_tagging_loss=0.006966, over 15105.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.0914, pruned_loss=0.01274, audio_tagging_loss=0.008594, over 3051556.41 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:44:16,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3166853.3333333335, ans=0.125 2023-11-27 17:44:27,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3166920.0, ans=0.1 2023-11-27 17:44:35,852 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475050 2023-11-27 17:44:37,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3166986.6666666665, ans=0.1 2023-11-27 17:44:39,012 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.92 vs. limit=15.0 2023-11-27 17:45:02,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3167120.0, ans=0.125 2023-11-27 17:45:10,566 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6150, loss[loss=0.05927, simple_loss=0.07972, pruned_loss=0.01113, audio_tagging_loss=0.008279, over 14597.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09096, pruned_loss=0.01278, audio_tagging_loss=0.008653, over 3051533.50 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:45:13,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3167186.6666666665, ans=0.125 2023-11-27 17:45:19,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3167186.6666666665, ans=0.0 2023-11-27 17:45:20,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3167186.6666666665, ans=0.125 2023-11-27 17:45:23,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3167253.3333333335, ans=0.0 2023-11-27 17:45:24,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3167253.3333333335, ans=0.1 2023-11-27 17:45:28,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3167253.3333333335, ans=0.035 2023-11-27 17:45:30,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3167253.3333333335, ans=0.125 2023-11-27 17:45:34,331 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475100 2023-11-27 17:45:56,718 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 8.777e+01 9.490e+01 1.001e+02 1.284e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 17:45:59,598 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.69 vs. limit=15.0 2023-11-27 17:46:08,777 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6200, loss[loss=0.06188, simple_loss=0.07585, pruned_loss=0.01195, audio_tagging_loss=0.01201, over 14838.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09081, pruned_loss=0.01274, audio_tagging_loss=0.008717, over 3045327.51 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:46:12,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3167520.0, ans=0.2 2023-11-27 17:46:18,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3167520.0, ans=0.125 2023-11-27 17:46:25,136 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.70 vs. limit=10.0 2023-11-27 17:46:31,929 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475150 2023-11-27 17:46:43,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3167720.0, ans=0.125 2023-11-27 17:46:50,536 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.80 vs. limit=15.0 2023-11-27 17:47:05,737 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6250, loss[loss=0.06989, simple_loss=0.09183, pruned_loss=0.01588, audio_tagging_loss=0.008096, over 14270.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09041, pruned_loss=0.01281, audio_tagging_loss=0.008825, over 3045289.69 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:47:09,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3167853.3333333335, ans=0.125 2023-11-27 17:47:28,417 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475200 2023-11-27 17:47:52,288 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.766e+01 9.317e+01 1.001e+02 1.294e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 17:48:03,986 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6300, loss[loss=0.03682, simple_loss=0.03583, pruned_loss=0.005577, audio_tagging_loss=0.01333, over 15103.00 frames. ], tot_loss[loss=0.06792, simple_loss=0.09193, pruned_loss=0.01304, audio_tagging_loss=0.008911, over 3054288.13 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:48:04,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3168186.6666666665, ans=0.125 2023-11-27 17:48:09,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3168186.6666666665, ans=0.025 2023-11-27 17:48:15,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3168253.3333333335, ans=0.2 2023-11-27 17:48:21,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3168253.3333333335, ans=0.2 2023-11-27 17:48:21,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3168253.3333333335, ans=0.09899494936611666 2023-11-27 17:48:24,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3168253.3333333335, ans=0.0 2023-11-27 17:48:27,779 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475250 2023-11-27 17:48:30,451 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2023-11-27 17:48:34,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.87 vs. limit=22.5 2023-11-27 17:48:35,027 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.92 vs. limit=12.0 2023-11-27 17:48:36,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3168320.0, ans=0.2 2023-11-27 17:48:48,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3168386.6666666665, ans=0.2 2023-11-27 17:49:00,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3168520.0, ans=0.125 2023-11-27 17:49:01,772 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6350, loss[loss=0.0608, simple_loss=0.08017, pruned_loss=0.0117, audio_tagging_loss=0.009011, over 14978.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.09155, pruned_loss=0.01286, audio_tagging_loss=0.008938, over 3051574.73 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:49:10,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3168520.0, ans=0.0 2023-11-27 17:49:25,280 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475300 2023-11-27 17:49:39,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3168720.0, ans=0.125 2023-11-27 17:49:39,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3168720.0, ans=0.125 2023-11-27 17:49:47,729 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.690e+01 8.715e+01 9.486e+01 1.017e+02 1.352e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-27 17:50:00,011 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6400, loss[loss=0.07295, simple_loss=0.102, pruned_loss=0.01319, audio_tagging_loss=0.008744, over 15453.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09089, pruned_loss=0.0128, audio_tagging_loss=0.009095, over 3054214.34 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:50:22,408 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475350 2023-11-27 17:50:25,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3168986.6666666665, ans=0.0 2023-11-27 17:50:31,031 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2023-11-27 17:50:31,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3168986.6666666665, ans=0.1 2023-11-27 17:50:42,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3169053.3333333335, ans=0.1 2023-11-27 17:50:51,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3169120.0, ans=0.125 2023-11-27 17:50:57,178 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6450, loss[loss=0.05516, simple_loss=0.0727, pruned_loss=0.009666, audio_tagging_loss=0.009146, over 14630.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09056, pruned_loss=0.01261, audio_tagging_loss=0.009109, over 3043853.72 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:51:09,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3169253.3333333335, ans=0.125 2023-11-27 17:51:12,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3169253.3333333335, ans=0.125 2023-11-27 17:51:20,172 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475400 2023-11-27 17:51:44,300 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.80 vs. limit=10.0 2023-11-27 17:51:44,574 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.677e+01 8.833e+01 9.242e+01 1.006e+02 1.317e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 17:51:53,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3169520.0, ans=0.125 2023-11-27 17:51:54,638 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6500, loss[loss=0.05478, simple_loss=0.08553, pruned_loss=0.006252, audio_tagging_loss=0.005768, over 15869.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.0911, pruned_loss=0.01272, audio_tagging_loss=0.009045, over 3041384.10 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:51:57,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3169520.0, ans=0.0 2023-11-27 17:52:18,431 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475450 2023-11-27 17:52:22,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3169653.3333333335, ans=0.0 2023-11-27 17:52:38,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3169720.0, ans=0.0 2023-11-27 17:52:53,662 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6550, loss[loss=0.06705, simple_loss=0.09716, pruned_loss=0.0108, audio_tagging_loss=0.007668, over 14715.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.0906, pruned_loss=0.01285, audio_tagging_loss=0.008903, over 3047600.55 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:53:16,489 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475500 2023-11-27 17:53:33,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3170053.3333333335, ans=0.125 2023-11-27 17:53:40,826 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 8.596e+01 9.247e+01 9.962e+01 1.603e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 17:53:51,305 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6600, loss[loss=0.05752, simple_loss=0.08218, pruned_loss=0.008046, audio_tagging_loss=0.008383, over 14584.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09091, pruned_loss=0.01282, audio_tagging_loss=0.008868, over 3050071.59 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:54:01,724 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.24 vs. limit=15.0 2023-11-27 17:54:04,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3170253.3333333335, ans=0.1 2023-11-27 17:54:13,814 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475550 2023-11-27 17:54:17,996 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.04 vs. limit=15.0 2023-11-27 17:54:48,465 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6650, loss[loss=0.08189, simple_loss=0.1201, pruned_loss=0.01693, audio_tagging_loss=0.004881, over 16793.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09095, pruned_loss=0.01284, audio_tagging_loss=0.008728, over 3050708.04 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:55:02,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3170586.6666666665, ans=0.2 2023-11-27 17:55:12,019 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475600 2023-11-27 17:55:18,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3170653.3333333335, ans=0.2 2023-11-27 17:55:19,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3170653.3333333335, ans=0.125 2023-11-27 17:55:24,080 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.02 vs. limit=12.0 2023-11-27 17:55:34,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3170786.6666666665, ans=0.125 2023-11-27 17:55:36,106 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.968e+01 8.804e+01 9.430e+01 1.026e+02 1.343e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 17:55:41,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3170786.6666666665, ans=0.04949747468305833 2023-11-27 17:55:46,145 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.02 vs. limit=15.0 2023-11-27 17:55:46,569 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6700, loss[loss=0.05723, simple_loss=0.08201, pruned_loss=0.006614, audio_tagging_loss=0.009614, over 15195.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09078, pruned_loss=0.01279, audio_tagging_loss=0.008673, over 3049279.77 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:55:51,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3170853.3333333335, ans=0.0 2023-11-27 17:56:09,809 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475650 2023-11-27 17:56:16,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3170986.6666666665, ans=0.125 2023-11-27 17:56:23,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3171053.3333333335, ans=0.125 2023-11-27 17:56:44,918 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6750, loss[loss=0.06963, simple_loss=0.09152, pruned_loss=0.01489, audio_tagging_loss=0.008981, over 15665.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09115, pruned_loss=0.01281, audio_tagging_loss=0.008648, over 3042435.90 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:56:51,074 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.49 vs. limit=15.0 2023-11-27 17:56:54,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3171186.6666666665, ans=0.025 2023-11-27 17:57:07,579 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475700 2023-11-27 17:57:08,085 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.11 vs. limit=22.5 2023-11-27 17:57:32,212 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.698e+01 8.726e+01 9.253e+01 9.869e+01 1.204e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-27 17:57:33,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3171453.3333333335, ans=0.0 2023-11-27 17:57:42,254 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6800, loss[loss=0.05378, simple_loss=0.06778, pruned_loss=0.008456, audio_tagging_loss=0.01143, over 14846.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08987, pruned_loss=0.0126, audio_tagging_loss=0.008736, over 3040214.17 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:57:44,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3171520.0, ans=0.125 2023-11-27 17:57:45,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3171520.0, ans=0.125 2023-11-27 17:58:01,128 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.23 vs. limit=15.0 2023-11-27 17:58:05,076 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475750 2023-11-27 17:58:23,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3171720.0, ans=0.125 2023-11-27 17:58:29,820 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.10 vs. limit=10.0 2023-11-27 17:58:40,107 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6850, loss[loss=0.05609, simple_loss=0.07302, pruned_loss=0.008325, audio_tagging_loss=0.01126, over 15976.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.08993, pruned_loss=0.0127, audio_tagging_loss=0.008706, over 3045736.39 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:58:43,821 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:58:45,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3171853.3333333335, ans=15.0 2023-11-27 17:58:49,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3171853.3333333335, ans=0.125 2023-11-27 17:58:51,589 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.21 vs. limit=12.0 2023-11-27 17:58:54,294 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.84 vs. limit=15.0 2023-11-27 17:59:03,424 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475800 2023-11-27 17:59:16,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3172053.3333333335, ans=0.0 2023-11-27 17:59:21,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3172053.3333333335, ans=0.125 2023-11-27 17:59:25,022 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.67 vs. limit=22.5 2023-11-27 17:59:26,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3172120.0, ans=0.0 2023-11-27 17:59:28,684 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.463e+01 8.901e+01 9.541e+01 1.005e+02 1.351e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 17:59:38,181 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6900, loss[loss=0.08277, simple_loss=0.1062, pruned_loss=0.01959, audio_tagging_loss=0.01008, over 14067.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09039, pruned_loss=0.01259, audio_tagging_loss=0.008682, over 3044650.13 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:59:38,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3172186.6666666665, ans=0.125 2023-11-27 17:59:44,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3172186.6666666665, ans=0.125 2023-11-27 17:59:47,161 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.98 vs. limit=15.0 2023-11-27 18:00:00,676 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.74 vs. limit=15.0 2023-11-27 18:00:01,157 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475850 2023-11-27 18:00:03,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3172320.0, ans=0.2 2023-11-27 18:00:03,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3172320.0, ans=0.0 2023-11-27 18:00:25,637 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:00:36,158 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6950, loss[loss=0.0618, simple_loss=0.08832, pruned_loss=0.008645, audio_tagging_loss=0.008994, over 15257.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09068, pruned_loss=0.0126, audio_tagging_loss=0.008698, over 3039247.54 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:00:53,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3172586.6666666665, ans=0.07 2023-11-27 18:00:59,195 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475900 2023-11-27 18:01:01,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3172653.3333333335, ans=0.125 2023-11-27 18:01:14,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3172720.0, ans=10.0 2023-11-27 18:01:24,337 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.751e+01 9.118e+01 9.607e+01 1.229e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-27 18:01:30,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3172786.6666666665, ans=0.1 2023-11-27 18:01:33,720 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7000, loss[loss=0.05579, simple_loss=0.07004, pruned_loss=0.01276, audio_tagging_loss=0.008011, over 14457.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09066, pruned_loss=0.01255, audio_tagging_loss=0.008727, over 3038948.50 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:01:34,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3172853.3333333335, ans=0.125 2023-11-27 18:01:35,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3172853.3333333335, ans=0.1 2023-11-27 18:01:37,644 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:01:45,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.97 vs. limit=8.0 2023-11-27 18:01:46,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3172920.0, ans=0.125 2023-11-27 18:01:56,660 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475950 2023-11-27 18:02:06,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3172986.6666666665, ans=0.125 2023-11-27 18:02:30,896 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7050, loss[loss=0.07412, simple_loss=0.09763, pruned_loss=0.01308, audio_tagging_loss=0.01222, over 15016.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08964, pruned_loss=0.01247, audio_tagging_loss=0.008877, over 3037361.10 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:02:42,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3173253.3333333335, ans=0.125 2023-11-27 18:02:46,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3173253.3333333335, ans=0.125 2023-11-27 18:02:46,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3173253.3333333335, ans=0.125 2023-11-27 18:02:54,198 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476000 2023-11-27 18:02:55,579 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-476000.pt 2023-11-27 18:03:01,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3173320.0, ans=0.125 2023-11-27 18:03:04,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3173320.0, ans=0.125 2023-11-27 18:03:21,881 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.448e+01 8.705e+01 9.232e+01 9.917e+01 1.412e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-27 18:03:25,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3173453.3333333335, ans=0.125 2023-11-27 18:03:26,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3173453.3333333335, ans=0.05 2023-11-27 18:03:31,261 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7100, loss[loss=0.05682, simple_loss=0.07357, pruned_loss=0.009583, audio_tagging_loss=0.01045, over 15495.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08931, pruned_loss=0.01235, audio_tagging_loss=0.008865, over 3037747.62 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:03:43,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3173586.6666666665, ans=0.125 2023-11-27 18:03:52,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3173586.6666666665, ans=0.2 2023-11-27 18:03:54,151 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476050 2023-11-27 18:04:28,675 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7150, loss[loss=0.07601, simple_loss=0.1013, pruned_loss=0.01618, audio_tagging_loss=0.009173, over 15918.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08879, pruned_loss=0.01217, audio_tagging_loss=0.008967, over 3037393.06 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:04:30,007 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:04:33,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3173853.3333333335, ans=0.125 2023-11-27 18:04:43,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3173920.0, ans=0.125 2023-11-27 18:04:50,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3173986.6666666665, ans=0.125 2023-11-27 18:04:51,738 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476100 2023-11-27 18:05:12,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3174053.3333333335, ans=0.0 2023-11-27 18:05:13,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3174120.0, ans=0.125 2023-11-27 18:05:17,250 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.847e+01 9.283e+01 1.002e+02 1.688e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 18:05:21,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3174120.0, ans=0.0 2023-11-27 18:05:25,997 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7200, loss[loss=0.05522, simple_loss=0.07457, pruned_loss=0.007632, audio_tagging_loss=0.0103, over 16043.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08885, pruned_loss=0.0121, audio_tagging_loss=0.009056, over 3041211.78 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:05:31,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3174186.6666666665, ans=0.0 2023-11-27 18:05:33,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3174186.6666666665, ans=0.1 2023-11-27 18:05:40,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=22.5 2023-11-27 18:05:48,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3174320.0, ans=0.125 2023-11-27 18:05:49,094 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476150 2023-11-27 18:06:00,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3174386.6666666665, ans=0.125 2023-11-27 18:06:11,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3174453.3333333335, ans=0.2 2023-11-27 18:06:17,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.05 vs. limit=15.0 2023-11-27 18:06:23,129 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7250, loss[loss=0.05319, simple_loss=0.0648, pruned_loss=0.00987, audio_tagging_loss=0.01092, over 16062.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08955, pruned_loss=0.01222, audio_tagging_loss=0.009204, over 3037473.12 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:06:23,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3174520.0, ans=0.05 2023-11-27 18:06:46,773 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476200 2023-11-27 18:06:48,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3174653.3333333335, ans=0.0 2023-11-27 18:06:58,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3174720.0, ans=0.125 2023-11-27 18:07:03,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3174720.0, ans=0.125 2023-11-27 18:07:06,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3174720.0, ans=0.2 2023-11-27 18:07:08,785 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:07:11,710 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.649e+01 9.273e+01 9.853e+01 1.291e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 18:07:13,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3174786.6666666665, ans=0.0 2023-11-27 18:07:18,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3174786.6666666665, ans=0.1 2023-11-27 18:07:21,020 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2023-11-27 18:07:21,682 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7300, loss[loss=0.06598, simple_loss=0.08461, pruned_loss=0.01482, audio_tagging_loss=0.008849, over 15179.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.0903, pruned_loss=0.01264, audio_tagging_loss=0.009018, over 3045684.38 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:07:28,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3174853.3333333335, ans=0.0 2023-11-27 18:07:32,348 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:07:36,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3174920.0, ans=0.0 2023-11-27 18:07:37,245 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2023-11-27 18:07:44,994 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476250 2023-11-27 18:08:00,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3175053.3333333335, ans=0.125 2023-11-27 18:08:04,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3175053.3333333335, ans=0.125 2023-11-27 18:08:19,043 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7350, loss[loss=0.06196, simple_loss=0.0885, pruned_loss=0.009953, audio_tagging_loss=0.007763, over 15145.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08985, pruned_loss=0.01252, audio_tagging_loss=0.008939, over 3047425.19 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:08:22,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3175186.6666666665, ans=0.125 2023-11-27 18:08:23,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3175186.6666666665, ans=0.125 2023-11-27 18:08:25,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3175186.6666666665, ans=0.125 2023-11-27 18:08:35,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3175253.3333333335, ans=0.0 2023-11-27 18:08:41,528 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476300 2023-11-27 18:09:08,063 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.365e+01 8.708e+01 9.249e+01 1.003e+02 1.493e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-27 18:09:15,716 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7400, loss[loss=0.06361, simple_loss=0.08057, pruned_loss=0.01503, audio_tagging_loss=0.008296, over 14443.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08851, pruned_loss=0.01242, audio_tagging_loss=0.008893, over 3038171.50 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:09:15,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3175520.0, ans=0.125 2023-11-27 18:09:29,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2023-11-27 18:09:34,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=3175586.6666666665, ans=0.2 2023-11-27 18:09:34,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3175586.6666666665, ans=0.0 2023-11-27 18:09:39,312 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476350 2023-11-27 18:09:43,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3175653.3333333335, ans=0.1 2023-11-27 18:09:53,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3175720.0, ans=0.1 2023-11-27 18:10:11,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3175853.3333333335, ans=0.1 2023-11-27 18:10:12,902 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7450, loss[loss=0.06125, simple_loss=0.08416, pruned_loss=0.009078, audio_tagging_loss=0.01009, over 14416.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08844, pruned_loss=0.01233, audio_tagging_loss=0.008775, over 3036379.87 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:10:35,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3175986.6666666665, ans=0.1 2023-11-27 18:10:36,501 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476400 2023-11-27 18:10:40,700 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2023-11-27 18:10:45,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.82 vs. limit=22.5 2023-11-27 18:10:46,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3176053.3333333335, ans=0.125 2023-11-27 18:11:02,895 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.662e+01 9.277e+01 9.892e+01 1.175e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 18:11:04,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3176120.0, ans=0.125 2023-11-27 18:11:11,063 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7500, loss[loss=0.06497, simple_loss=0.09048, pruned_loss=0.01112, audio_tagging_loss=0.008614, over 15237.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08956, pruned_loss=0.01245, audio_tagging_loss=0.008734, over 3046524.24 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:11:12,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3176186.6666666665, ans=0.125 2023-11-27 18:11:33,552 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476450 2023-11-27 18:11:58,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3176453.3333333335, ans=0.125 2023-11-27 18:11:58,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3176453.3333333335, ans=0.0 2023-11-27 18:12:00,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3176453.3333333335, ans=0.2 2023-11-27 18:12:08,315 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7550, loss[loss=0.05629, simple_loss=0.08148, pruned_loss=0.007495, audio_tagging_loss=0.008053, over 16076.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09029, pruned_loss=0.01261, audio_tagging_loss=0.00862, over 3050155.87 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:12:20,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3176586.6666666665, ans=0.0 2023-11-27 18:12:31,237 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476500 2023-11-27 18:12:32,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3176653.3333333335, ans=0.2 2023-11-27 18:12:46,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3176720.0, ans=0.125 2023-11-27 18:12:46,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3176720.0, ans=0.125 2023-11-27 18:12:56,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3176786.6666666665, ans=0.1 2023-11-27 18:12:57,547 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 8.727e+01 9.587e+01 1.045e+02 1.317e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 18:13:01,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3176786.6666666665, ans=0.0 2023-11-27 18:13:05,312 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7600, loss[loss=0.06096, simple_loss=0.07852, pruned_loss=0.01223, audio_tagging_loss=0.009472, over 14470.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08893, pruned_loss=0.01236, audio_tagging_loss=0.00873, over 3040489.63 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:13:07,103 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:13:12,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3176853.3333333335, ans=0.0 2023-11-27 18:13:27,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3176986.6666666665, ans=0.125 2023-11-27 18:13:28,825 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476550 2023-11-27 18:13:32,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3176986.6666666665, ans=0.125 2023-11-27 18:14:03,567 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7650, loss[loss=0.07284, simple_loss=0.09864, pruned_loss=0.01319, audio_tagging_loss=0.01034, over 15378.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08856, pruned_loss=0.01223, audio_tagging_loss=0.008632, over 3039597.53 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:14:26,015 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476600 2023-11-27 18:14:27,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3177320.0, ans=0.05 2023-11-27 18:14:29,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.03 vs. limit=15.0 2023-11-27 18:14:37,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3177386.6666666665, ans=0.0 2023-11-27 18:14:51,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3177453.3333333335, ans=0.0 2023-11-27 18:14:52,991 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.590e+01 9.167e+01 9.909e+01 1.245e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-27 18:14:58,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3177453.3333333335, ans=0.0 2023-11-27 18:15:01,121 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7700, loss[loss=0.0533, simple_loss=0.0721, pruned_loss=0.008453, audio_tagging_loss=0.008797, over 14038.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08866, pruned_loss=0.01223, audio_tagging_loss=0.008686, over 3034842.83 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:15:02,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3177520.0, ans=0.125 2023-11-27 18:15:02,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.12 vs. limit=22.5 2023-11-27 18:15:06,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3177520.0, ans=0.125 2023-11-27 18:15:23,717 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476650 2023-11-27 18:15:39,129 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2023-11-27 18:15:41,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3177720.0, ans=0.2 2023-11-27 18:15:57,831 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7750, loss[loss=0.05891, simple_loss=0.08107, pruned_loss=0.01055, audio_tagging_loss=0.007822, over 14736.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08841, pruned_loss=0.01227, audio_tagging_loss=0.008852, over 3033783.05 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:15:58,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3177853.3333333335, ans=0.125 2023-11-27 18:16:21,025 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476700 2023-11-27 18:16:38,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3178053.3333333335, ans=0.0 2023-11-27 18:16:48,182 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.922e+01 8.657e+01 9.352e+01 9.918e+01 1.309e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-27 18:16:51,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3178120.0, ans=0.125 2023-11-27 18:16:54,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2023-11-27 18:16:54,650 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7800, loss[loss=0.06494, simple_loss=0.0884, pruned_loss=0.01106, audio_tagging_loss=0.009684, over 14920.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08906, pruned_loss=0.01235, audio_tagging_loss=0.008748, over 3040186.57 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:17:04,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3178186.6666666665, ans=0.0 2023-11-27 18:17:17,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3178320.0, ans=0.125 2023-11-27 18:17:18,178 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476750 2023-11-27 18:17:18,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3178320.0, ans=0.0 2023-11-27 18:17:24,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3178320.0, ans=0.125 2023-11-27 18:17:29,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3178386.6666666665, ans=0.125 2023-11-27 18:17:37,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3178386.6666666665, ans=0.125 2023-11-27 18:17:43,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3178453.3333333335, ans=0.2 2023-11-27 18:17:51,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3178520.0, ans=0.125 2023-11-27 18:17:53,007 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7850, loss[loss=0.0617, simple_loss=0.08589, pruned_loss=0.01148, audio_tagging_loss=0.007269, over 15232.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08925, pruned_loss=0.01233, audio_tagging_loss=0.008852, over 3036768.97 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:18:15,367 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476800 2023-11-27 18:18:19,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3178653.3333333335, ans=0.1 2023-11-27 18:18:27,658 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2023-11-27 18:18:44,723 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 8.789e+01 9.389e+01 9.930e+01 1.229e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-27 18:18:50,074 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7900, loss[loss=0.06221, simple_loss=0.08062, pruned_loss=0.01281, audio_tagging_loss=0.009093, over 15963.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08925, pruned_loss=0.0123, audio_tagging_loss=0.008892, over 3042465.36 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:18:50,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3178853.3333333335, ans=0.2 2023-11-27 18:18:57,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3178853.3333333335, ans=0.125 2023-11-27 18:19:13,031 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476850 2023-11-27 18:19:27,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3179053.3333333335, ans=0.0 2023-11-27 18:19:43,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3179120.0, ans=0.0 2023-11-27 18:19:47,735 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7950, loss[loss=0.05373, simple_loss=0.06908, pruned_loss=0.009534, audio_tagging_loss=0.009652, over 15341.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08996, pruned_loss=0.01251, audio_tagging_loss=0.008877, over 3046137.33 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:19:47,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3179186.6666666665, ans=0.1 2023-11-27 18:19:57,256 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:20:03,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3179253.3333333335, ans=0.0 2023-11-27 18:20:05,983 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:20:11,393 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476900 2023-11-27 18:20:27,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3179386.6666666665, ans=0.125 2023-11-27 18:20:28,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3179386.6666666665, ans=0.2 2023-11-27 18:20:39,448 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 8.819e+01 9.346e+01 1.021e+02 1.484e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-27 18:20:41,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3179453.3333333335, ans=0.0 2023-11-27 18:20:44,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3179520.0, ans=0.125 2023-11-27 18:20:44,961 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8000, loss[loss=0.08783, simple_loss=0.129, pruned_loss=0.01456, audio_tagging_loss=0.008789, over 16096.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09017, pruned_loss=0.0125, audio_tagging_loss=0.009011, over 3043006.18 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:20:50,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3179520.0, ans=0.0 2023-11-27 18:20:56,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3179586.6666666665, ans=0.0 2023-11-27 18:20:57,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3179586.6666666665, ans=0.125 2023-11-27 18:21:04,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3179586.6666666665, ans=0.125 2023-11-27 18:21:08,514 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476950 2023-11-27 18:21:24,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3179720.0, ans=0.1 2023-11-27 18:21:24,900 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.94 vs. limit=15.0 2023-11-27 18:21:32,107 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=22.5 2023-11-27 18:21:35,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3179786.6666666665, ans=0.0 2023-11-27 18:21:36,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.50 vs. limit=15.0 2023-11-27 18:21:42,505 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8050, loss[loss=0.08087, simple_loss=0.1188, pruned_loss=0.01377, audio_tagging_loss=0.007705, over 16423.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09066, pruned_loss=0.01241, audio_tagging_loss=0.008887, over 3043221.15 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:21:45,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.55 vs. limit=22.5 2023-11-27 18:21:48,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3179853.3333333335, ans=0.125 2023-11-27 18:21:48,540 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2023-11-27 18:21:52,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3179920.0, ans=0.125 2023-11-27 18:22:03,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3179920.0, ans=0.125 2023-11-27 18:22:05,363 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477000 2023-11-27 18:22:10,104 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:22:13,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3179986.6666666665, ans=0.125 2023-11-27 18:22:18,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3180053.3333333335, ans=0.0 2023-11-27 18:22:27,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3180120.0, ans=0.0 2023-11-27 18:22:35,020 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 8.539e+01 9.239e+01 9.821e+01 1.214e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 18:22:39,957 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8100, loss[loss=0.0786, simple_loss=0.1078, pruned_loss=0.01757, audio_tagging_loss=0.007121, over 15440.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09099, pruned_loss=0.01255, audio_tagging_loss=0.008862, over 3046312.34 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:22:41,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3180186.6666666665, ans=0.0 2023-11-27 18:22:41,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.97 vs. limit=15.0 2023-11-27 18:22:51,999 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2023-11-27 18:22:58,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3180253.3333333335, ans=0.0 2023-11-27 18:23:03,622 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477050 2023-11-27 18:23:17,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3180386.6666666665, ans=0.1 2023-11-27 18:23:36,935 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8150, loss[loss=0.06564, simple_loss=0.08524, pruned_loss=0.0118, audio_tagging_loss=0.01122, over 15662.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09092, pruned_loss=0.01261, audio_tagging_loss=0.00872, over 3044601.92 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:24:00,069 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477100 2023-11-27 18:24:15,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3180720.0, ans=0.0 2023-11-27 18:24:15,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3180720.0, ans=0.0 2023-11-27 18:24:21,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.96 vs. limit=22.5 2023-11-27 18:24:29,746 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.482e+01 9.156e+01 9.778e+01 1.274e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-27 18:24:34,780 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8200, loss[loss=0.06498, simple_loss=0.09165, pruned_loss=0.01192, audio_tagging_loss=0.007237, over 14255.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09027, pruned_loss=0.01246, audio_tagging_loss=0.008622, over 3046098.38 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:24:37,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3180853.3333333335, ans=0.125 2023-11-27 18:24:39,152 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:24:49,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3180920.0, ans=0.1 2023-11-27 18:24:55,263 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.95 vs. limit=6.0 2023-11-27 18:24:55,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.08 vs. limit=22.5 2023-11-27 18:24:56,937 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477150 2023-11-27 18:25:31,877 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8250, loss[loss=0.06757, simple_loss=0.09474, pruned_loss=0.01149, audio_tagging_loss=0.008703, over 16840.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09001, pruned_loss=0.01237, audio_tagging_loss=0.008655, over 3056660.19 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:25:44,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3181253.3333333335, ans=0.1 2023-11-27 18:25:48,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3181253.3333333335, ans=10.0 2023-11-27 18:25:54,928 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477200 2023-11-27 18:26:24,476 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.573e+01 9.130e+01 1.008e+02 1.998e+02, threshold=1.826e+02, percent-clipped=1.0 2023-11-27 18:26:29,304 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8300, loss[loss=0.08168, simple_loss=0.1069, pruned_loss=0.01978, audio_tagging_loss=0.008428, over 15122.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09038, pruned_loss=0.01252, audio_tagging_loss=0.008634, over 3056382.40 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:26:34,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3181520.0, ans=0.125 2023-11-27 18:26:52,239 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477250 2023-11-27 18:26:59,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3181653.3333333335, ans=0.2 2023-11-27 18:27:00,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3181653.3333333335, ans=0.125 2023-11-27 18:27:14,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3181786.6666666665, ans=0.2 2023-11-27 18:27:16,689 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.55 vs. limit=12.0 2023-11-27 18:27:26,437 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8350, loss[loss=0.05243, simple_loss=0.06963, pruned_loss=0.008847, audio_tagging_loss=0.00877, over 15017.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09003, pruned_loss=0.01239, audio_tagging_loss=0.008591, over 3057889.58 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:27:49,261 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477300 2023-11-27 18:27:49,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3181986.6666666665, ans=0.125 2023-11-27 18:28:08,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3182053.3333333335, ans=0.125 2023-11-27 18:28:19,115 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.748e+01 9.541e+01 1.013e+02 1.320e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 18:28:19,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3182120.0, ans=0.1 2023-11-27 18:28:20,902 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=15.0 2023-11-27 18:28:23,401 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8400, loss[loss=0.05156, simple_loss=0.06996, pruned_loss=0.01134, audio_tagging_loss=0.005244, over 15031.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08996, pruned_loss=0.01244, audio_tagging_loss=0.008531, over 3057360.26 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:28:44,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=3182253.3333333335, ans=0.1 2023-11-27 18:28:46,244 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477350 2023-11-27 18:29:07,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3182386.6666666665, ans=0.125 2023-11-27 18:29:20,945 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8450, loss[loss=0.05819, simple_loss=0.07474, pruned_loss=0.009263, audio_tagging_loss=0.01155, over 14809.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09011, pruned_loss=0.01254, audio_tagging_loss=0.008515, over 3051277.90 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:29:31,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3182586.6666666665, ans=0.125 2023-11-27 18:29:33,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3182586.6666666665, ans=0.0 2023-11-27 18:29:43,524 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477400 2023-11-27 18:29:53,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3182653.3333333335, ans=0.125 2023-11-27 18:30:13,796 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 8.652e+01 9.208e+01 1.012e+02 1.151e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-27 18:30:16,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3182786.6666666665, ans=0.1 2023-11-27 18:30:17,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3182853.3333333335, ans=0.025 2023-11-27 18:30:18,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3182853.3333333335, ans=0.1 2023-11-27 18:30:18,896 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8500, loss[loss=0.06399, simple_loss=0.09091, pruned_loss=0.01331, audio_tagging_loss=0.005221, over 15443.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08975, pruned_loss=0.01244, audio_tagging_loss=0.008596, over 3053850.25 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:30:25,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3182853.3333333335, ans=0.125 2023-11-27 18:30:35,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3182920.0, ans=0.125 2023-11-27 18:30:42,041 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477450 2023-11-27 18:30:42,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3182986.6666666665, ans=10.0 2023-11-27 18:30:47,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3182986.6666666665, ans=0.2 2023-11-27 18:30:49,368 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2023-11-27 18:31:00,863 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:31:03,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3183053.3333333335, ans=0.0 2023-11-27 18:31:05,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3183120.0, ans=0.0 2023-11-27 18:31:16,559 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8550, loss[loss=0.0457, simple_loss=0.05929, pruned_loss=0.006473, audio_tagging_loss=0.009581, over 15369.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08929, pruned_loss=0.0124, audio_tagging_loss=0.008621, over 3048860.86 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:31:16,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3183186.6666666665, ans=0.5 2023-11-27 18:31:20,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3183186.6666666665, ans=0.0 2023-11-27 18:31:29,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3183253.3333333335, ans=0.125 2023-11-27 18:31:31,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3183253.3333333335, ans=0.1 2023-11-27 18:31:39,302 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477500 2023-11-27 18:31:51,465 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.54 vs. limit=15.0 2023-11-27 18:31:54,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3183386.6666666665, ans=0.1 2023-11-27 18:32:09,081 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 8.725e+01 9.304e+01 1.021e+02 1.373e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 18:32:09,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3183453.3333333335, ans=0.125 2023-11-27 18:32:13,966 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8600, loss[loss=0.07863, simple_loss=0.1159, pruned_loss=0.01411, audio_tagging_loss=0.00657, over 15885.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09077, pruned_loss=0.01257, audio_tagging_loss=0.008639, over 3048235.46 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:32:25,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3183586.6666666665, ans=15.0 2023-11-27 18:32:36,466 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477550 2023-11-27 18:32:41,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3183653.3333333335, ans=0.2 2023-11-27 18:32:47,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3183720.0, ans=0.0 2023-11-27 18:32:48,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3183720.0, ans=0.1 2023-11-27 18:33:06,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3183786.6666666665, ans=0.1 2023-11-27 18:33:09,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3183786.6666666665, ans=0.125 2023-11-27 18:33:11,375 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8650, loss[loss=0.07552, simple_loss=0.09946, pruned_loss=0.01746, audio_tagging_loss=0.008324, over 14999.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09131, pruned_loss=0.01266, audio_tagging_loss=0.008663, over 3045662.01 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:33:11,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3183853.3333333335, ans=0.2 2023-11-27 18:33:34,148 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477600 2023-11-27 18:33:38,251 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.31 vs. limit=10.0 2023-11-27 18:33:48,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3184053.3333333335, ans=0.0 2023-11-27 18:34:04,085 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.946e+01 9.500e+01 1.005e+02 1.406e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-27 18:34:08,467 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8700, loss[loss=0.08583, simple_loss=0.1226, pruned_loss=0.01704, audio_tagging_loss=0.007477, over 15546.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.0911, pruned_loss=0.01269, audio_tagging_loss=0.008811, over 3042017.27 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:34:13,241 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=15.0 2023-11-27 18:34:30,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3184320.0, ans=0.0 2023-11-27 18:34:31,887 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477650 2023-11-27 18:35:05,939 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8750, loss[loss=0.05937, simple_loss=0.07362, pruned_loss=0.01044, audio_tagging_loss=0.01212, over 14818.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09149, pruned_loss=0.01274, audio_tagging_loss=0.008876, over 3042058.35 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:35:06,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3184520.0, ans=0.125 2023-11-27 18:35:15,144 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:35:18,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3184586.6666666665, ans=0.1 2023-11-27 18:35:25,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3184586.6666666665, ans=0.1 2023-11-27 18:35:28,789 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477700 2023-11-27 18:35:47,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3184720.0, ans=0.125 2023-11-27 18:35:54,611 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:35:58,774 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.523e+01 8.815e+01 9.414e+01 9.987e+01 1.374e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-27 18:36:03,875 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8800, loss[loss=0.07897, simple_loss=0.1095, pruned_loss=0.01631, audio_tagging_loss=0.007921, over 15291.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.09157, pruned_loss=0.01266, audio_tagging_loss=0.008957, over 3044838.43 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:36:13,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3184920.0, ans=0.125 2023-11-27 18:36:26,064 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477750 2023-11-27 18:36:34,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3184986.6666666665, ans=0.125 2023-11-27 18:36:51,657 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:36:51,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3185120.0, ans=0.125 2023-11-27 18:36:52,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3185120.0, ans=0.0 2023-11-27 18:37:00,088 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8850, loss[loss=0.06335, simple_loss=0.08905, pruned_loss=0.01189, audio_tagging_loss=0.006936, over 14006.00 frames. ], tot_loss[loss=0.06761, simple_loss=0.0919, pruned_loss=0.01274, audio_tagging_loss=0.008921, over 3045075.98 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:37:14,845 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:37:20,421 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:37:23,592 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477800 2023-11-27 18:37:29,860 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2023-11-27 18:37:45,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3185453.3333333335, ans=0.0 2023-11-27 18:37:54,288 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 8.632e+01 9.430e+01 1.040e+02 1.292e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 18:37:56,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=12.0 2023-11-27 18:37:57,543 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8900, loss[loss=0.08208, simple_loss=0.1111, pruned_loss=0.01831, audio_tagging_loss=0.008206, over 15500.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09162, pruned_loss=0.01269, audio_tagging_loss=0.00885, over 3041220.86 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:38:07,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3185520.0, ans=0.2 2023-11-27 18:38:20,418 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477850 2023-11-27 18:38:32,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3185720.0, ans=0.2 2023-11-27 18:38:43,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2023-11-27 18:38:54,432 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8950, loss[loss=0.07085, simple_loss=0.09671, pruned_loss=0.01601, audio_tagging_loss=0.006489, over 15213.00 frames. ], tot_loss[loss=0.0679, simple_loss=0.09251, pruned_loss=0.013, audio_tagging_loss=0.008638, over 3053940.50 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:39:16,889 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477900 2023-11-27 18:39:36,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3186053.3333333335, ans=0.0 2023-11-27 18:39:49,550 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.830e+01 8.925e+01 9.376e+01 9.837e+01 1.193e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-27 18:39:51,774 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9000, loss[loss=0.07217, simple_loss=0.09983, pruned_loss=0.01473, audio_tagging_loss=0.00753, over 15935.00 frames. ], tot_loss[loss=0.06774, simple_loss=0.09223, pruned_loss=0.01298, audio_tagging_loss=0.008643, over 3052968.84 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:39:51,776 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 18:40:27,266 INFO [train_asr.py:1267] (0/4) Epoch 40, validation: loss=0.05837, simple_loss=0.05058, pruned_loss=0.005173, audio_tagging_loss=0.02791, over 4681554.00 frames. 2023-11-27 18:40:27,267 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 18:40:42,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3186253.3333333335, ans=0.125 2023-11-27 18:40:50,088 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477950 2023-11-27 18:40:58,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3186320.0, ans=0.04949747468305833 2023-11-27 18:41:05,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3186386.6666666665, ans=0.125 2023-11-27 18:41:09,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3186386.6666666665, ans=0.2 2023-11-27 18:41:14,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3186453.3333333335, ans=0.125 2023-11-27 18:41:16,892 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=22.5 2023-11-27 18:41:17,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3186453.3333333335, ans=0.125 2023-11-27 18:41:25,258 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9050, loss[loss=0.06296, simple_loss=0.08343, pruned_loss=0.01238, audio_tagging_loss=0.008865, over 15561.00 frames. ], tot_loss[loss=0.06781, simple_loss=0.09228, pruned_loss=0.01303, audio_tagging_loss=0.008635, over 3051966.73 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 4.0 2023-11-27 18:41:27,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3186520.0, ans=0.125 2023-11-27 18:41:34,662 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.10 vs. limit=15.0 2023-11-27 18:41:39,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3186586.6666666665, ans=0.125 2023-11-27 18:41:47,792 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478000 2023-11-27 18:42:00,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3186720.0, ans=0.0 2023-11-27 18:42:07,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3186720.0, ans=0.125 2023-11-27 18:42:16,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3186786.6666666665, ans=0.125 2023-11-27 18:42:19,867 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.28 vs. limit=15.0 2023-11-27 18:42:21,506 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 8.889e+01 9.370e+01 1.013e+02 1.191e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 18:42:22,733 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9100, loss[loss=0.0507, simple_loss=0.07184, pruned_loss=0.006557, audio_tagging_loss=0.008221, over 15315.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09183, pruned_loss=0.01296, audio_tagging_loss=0.008597, over 3049791.38 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:42:40,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3186920.0, ans=0.0 2023-11-27 18:42:45,868 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478050 2023-11-27 18:42:52,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3186986.6666666665, ans=0.1 2023-11-27 18:42:53,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3186986.6666666665, ans=0.125 2023-11-27 18:42:55,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3186986.6666666665, ans=0.0 2023-11-27 18:43:15,070 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2023-11-27 18:43:20,519 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9150, loss[loss=0.0684, simple_loss=0.08994, pruned_loss=0.0157, audio_tagging_loss=0.007726, over 14777.00 frames. ], tot_loss[loss=0.06754, simple_loss=0.09153, pruned_loss=0.0131, audio_tagging_loss=0.008677, over 3048297.49 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 18:43:24,663 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=22.5 2023-11-27 18:43:44,078 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478100 2023-11-27 18:43:51,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3187320.0, ans=0.125 2023-11-27 18:43:59,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3187386.6666666665, ans=0.125 2023-11-27 18:44:03,471 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.46 vs. limit=15.0 2023-11-27 18:44:11,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3187453.3333333335, ans=0.1 2023-11-27 18:44:15,757 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:44:17,239 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.460e+01 8.554e+01 9.287e+01 9.975e+01 1.548e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 18:44:18,391 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9200, loss[loss=0.05056, simple_loss=0.06867, pruned_loss=0.009496, audio_tagging_loss=0.006728, over 15480.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09155, pruned_loss=0.01303, audio_tagging_loss=0.008648, over 3050895.25 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:44:28,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3187520.0, ans=0.1 2023-11-27 18:44:41,023 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478150 2023-11-27 18:44:59,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3187720.0, ans=0.0 2023-11-27 18:45:15,788 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9250, loss[loss=0.07017, simple_loss=0.09494, pruned_loss=0.01503, audio_tagging_loss=0.007667, over 15693.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09114, pruned_loss=0.01286, audio_tagging_loss=0.008705, over 3057968.75 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:45:29,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3187920.0, ans=0.0 2023-11-27 18:45:39,011 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478200 2023-11-27 18:45:56,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3188053.3333333335, ans=0.125 2023-11-27 18:46:02,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3188120.0, ans=0.0 2023-11-27 18:46:02,917 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2023-11-27 18:46:05,158 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.78 vs. limit=15.0 2023-11-27 18:46:11,997 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-27 18:46:12,564 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.297e+01 8.841e+01 9.296e+01 9.979e+01 1.330e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-27 18:46:13,730 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9300, loss[loss=0.06188, simple_loss=0.08136, pruned_loss=0.01159, audio_tagging_loss=0.009601, over 14584.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.0912, pruned_loss=0.01286, audio_tagging_loss=0.00871, over 3064626.46 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:46:34,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3188253.3333333335, ans=0.2 2023-11-27 18:46:37,504 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478250 2023-11-27 18:47:01,835 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=22.5 2023-11-27 18:47:11,356 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9350, loss[loss=0.04594, simple_loss=0.05435, pruned_loss=0.005835, audio_tagging_loss=0.01293, over 16211.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09035, pruned_loss=0.01256, audio_tagging_loss=0.008812, over 3059543.58 frames. ], batch size: 64, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:47:34,523 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478300 2023-11-27 18:47:37,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3188653.3333333335, ans=0.0 2023-11-27 18:47:44,186 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.30 vs. limit=15.0 2023-11-27 18:47:46,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3188720.0, ans=0.125 2023-11-27 18:48:02,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3188786.6666666665, ans=0.04949747468305833 2023-11-27 18:48:08,494 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.625e+01 9.314e+01 1.018e+02 1.859e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 18:48:09,690 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9400, loss[loss=0.08092, simple_loss=0.1044, pruned_loss=0.01897, audio_tagging_loss=0.009744, over 15150.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.08991, pruned_loss=0.01251, audio_tagging_loss=0.008865, over 3052600.09 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:48:10,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3188853.3333333335, ans=0.1 2023-11-27 18:48:13,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3188853.3333333335, ans=0.1 2023-11-27 18:48:32,785 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478350 2023-11-27 18:48:57,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3189120.0, ans=0.0 2023-11-27 18:49:07,275 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9450, loss[loss=0.06741, simple_loss=0.09148, pruned_loss=0.01376, audio_tagging_loss=0.007908, over 14993.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09078, pruned_loss=0.01272, audio_tagging_loss=0.008832, over 3048117.22 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:49:08,443 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:49:15,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3189186.6666666665, ans=0.125 2023-11-27 18:49:30,395 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478400 2023-11-27 18:49:33,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3189320.0, ans=0.0 2023-11-27 18:49:39,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3189320.0, ans=0.125 2023-11-27 18:49:43,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3189386.6666666665, ans=0.125 2023-11-27 18:49:49,263 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2023-11-27 18:50:04,752 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.634e+01 9.375e+01 9.974e+01 1.335e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-27 18:50:04,778 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9500, loss[loss=0.0714, simple_loss=0.09945, pruned_loss=0.01342, audio_tagging_loss=0.008253, over 14687.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09122, pruned_loss=0.01273, audio_tagging_loss=0.008858, over 3051544.17 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 18:50:06,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3189520.0, ans=0.1 2023-11-27 18:50:28,192 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478450 2023-11-27 18:50:42,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3189720.0, ans=0.125 2023-11-27 18:50:49,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3189786.6666666665, ans=0.0 2023-11-27 18:51:02,308 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9550, loss[loss=0.07126, simple_loss=0.08912, pruned_loss=0.01719, audio_tagging_loss=0.009512, over 14342.00 frames. ], tot_loss[loss=0.068, simple_loss=0.0925, pruned_loss=0.01288, audio_tagging_loss=0.008862, over 3052991.29 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 18:51:07,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3189853.3333333335, ans=0.0 2023-11-27 18:51:21,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3189920.0, ans=0.0 2023-11-27 18:51:26,124 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478500 2023-11-27 18:51:30,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3189986.6666666665, ans=0.0 2023-11-27 18:51:52,937 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.18 vs. limit=15.0 2023-11-27 18:51:59,905 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.341e+01 9.036e+01 9.952e+01 1.407e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-27 18:51:59,932 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9600, loss[loss=0.08331, simple_loss=0.1159, pruned_loss=0.01758, audio_tagging_loss=0.007755, over 15871.00 frames. ], tot_loss[loss=0.06776, simple_loss=0.09203, pruned_loss=0.01283, audio_tagging_loss=0.008914, over 3056607.13 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:52:15,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3190253.3333333335, ans=0.0 2023-11-27 18:52:15,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2023-11-27 18:52:23,577 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478550 2023-11-27 18:52:38,181 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2023-11-27 18:52:39,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3190386.6666666665, ans=0.125 2023-11-27 18:52:44,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3190386.6666666665, ans=0.1 2023-11-27 18:52:58,204 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9650, loss[loss=0.07064, simple_loss=0.1042, pruned_loss=0.01161, audio_tagging_loss=0.006936, over 16028.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09116, pruned_loss=0.01273, audio_tagging_loss=0.008973, over 3062228.47 frames. ], batch size: 63, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:53:16,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3190586.6666666665, ans=0.0 2023-11-27 18:53:20,779 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478600 2023-11-27 18:53:28,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.75 vs. limit=10.0 2023-11-27 18:53:33,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3190720.0, ans=0.125 2023-11-27 18:53:36,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3190720.0, ans=0.125 2023-11-27 18:53:55,994 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.441e+01 8.654e+01 9.418e+01 1.007e+02 1.330e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-27 18:53:56,025 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9700, loss[loss=0.06852, simple_loss=0.1023, pruned_loss=0.008234, audio_tagging_loss=0.009142, over 14694.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09015, pruned_loss=0.01255, audio_tagging_loss=0.008888, over 3060440.98 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:53:57,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3190853.3333333335, ans=0.125 2023-11-27 18:54:06,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3190920.0, ans=0.0 2023-11-27 18:54:09,399 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=12.0 2023-11-27 18:54:19,010 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478650 2023-11-27 18:54:26,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3190986.6666666665, ans=0.1 2023-11-27 18:54:27,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3190986.6666666665, ans=0.1 2023-11-27 18:54:45,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3191120.0, ans=0.125 2023-11-27 18:54:52,952 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9750, loss[loss=0.07866, simple_loss=0.1031, pruned_loss=0.01826, audio_tagging_loss=0.008864, over 15238.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08982, pruned_loss=0.01245, audio_tagging_loss=0.008748, over 3055489.86 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:54:55,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3191186.6666666665, ans=0.1 2023-11-27 18:55:03,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3191186.6666666665, ans=0.1 2023-11-27 18:55:04,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3191253.3333333335, ans=0.125 2023-11-27 18:55:16,975 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478700 2023-11-27 18:55:20,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.37 vs. limit=10.0 2023-11-27 18:55:29,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3191386.6666666665, ans=0.0 2023-11-27 18:55:31,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3191386.6666666665, ans=0.0 2023-11-27 18:55:44,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3191453.3333333335, ans=0.125 2023-11-27 18:55:51,102 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.742e+01 9.201e+01 9.783e+01 1.182e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-27 18:55:51,128 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9800, loss[loss=0.06592, simple_loss=0.09039, pruned_loss=0.01405, audio_tagging_loss=0.006678, over 14853.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09011, pruned_loss=0.01244, audio_tagging_loss=0.008587, over 3045261.12 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:56:13,845 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478750 2023-11-27 18:56:31,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3191720.0, ans=0.1 2023-11-27 18:56:44,789 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2023-11-27 18:56:45,444 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:56:48,640 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9850, loss[loss=0.07442, simple_loss=0.102, pruned_loss=0.01442, audio_tagging_loss=0.009004, over 14910.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09096, pruned_loss=0.0127, audio_tagging_loss=0.008523, over 3047968.14 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:57:11,563 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478800 2023-11-27 18:57:13,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2023-11-27 18:57:16,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3191986.6666666665, ans=0.125 2023-11-27 18:57:21,235 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:57:45,660 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.871e+01 9.511e+01 1.009e+02 1.336e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 18:57:45,686 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9900, loss[loss=0.06296, simple_loss=0.0889, pruned_loss=0.008989, audio_tagging_loss=0.009516, over 14741.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09081, pruned_loss=0.01271, audio_tagging_loss=0.008572, over 3046977.27 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:58:09,246 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478850 2023-11-27 18:58:27,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3192386.6666666665, ans=0.125 2023-11-27 18:58:29,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3192386.6666666665, ans=0.125 2023-11-27 18:58:42,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3192520.0, ans=0.1 2023-11-27 18:58:43,970 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9950, loss[loss=0.06289, simple_loss=0.09086, pruned_loss=0.009648, audio_tagging_loss=0.007811, over 15379.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09054, pruned_loss=0.01268, audio_tagging_loss=0.008637, over 3037736.32 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:59:06,670 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478900 2023-11-27 18:59:17,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3192720.0, ans=0.2 2023-11-27 18:59:21,664 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.47 vs. limit=15.0 2023-11-27 18:59:23,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3192720.0, ans=0.125 2023-11-27 18:59:41,471 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.506e+01 9.259e+01 9.823e+01 1.115e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-27 18:59:41,498 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10000, loss[loss=0.06299, simple_loss=0.09373, pruned_loss=0.00995, audio_tagging_loss=0.006179, over 13599.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09074, pruned_loss=0.01264, audio_tagging_loss=0.00861, over 3041129.15 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 18:59:49,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3192853.3333333335, ans=0.2 2023-11-27 18:59:52,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3192920.0, ans=0.07 2023-11-27 18:59:58,849 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.21 vs. limit=22.5 2023-11-27 19:00:04,080 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478950 2023-11-27 19:00:04,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3192986.6666666665, ans=0.1 2023-11-27 19:00:05,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3192986.6666666665, ans=0.125 2023-11-27 19:00:08,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3192986.6666666665, ans=0.125 2023-11-27 19:00:27,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3193120.0, ans=0.125 2023-11-27 19:00:38,025 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10050, loss[loss=0.08412, simple_loss=0.1161, pruned_loss=0.01749, audio_tagging_loss=0.008593, over 15003.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08928, pruned_loss=0.01223, audio_tagging_loss=0.008741, over 3037461.57 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 19:01:01,603 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479000 2023-11-27 19:01:25,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3193453.3333333335, ans=0.125 2023-11-27 19:01:35,904 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10100, loss[loss=0.07086, simple_loss=0.09223, pruned_loss=0.01479, audio_tagging_loss=0.009953, over 14976.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.0885, pruned_loss=0.0121, audio_tagging_loss=0.008885, over 3035796.43 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:01:36,987 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 8.749e+01 9.301e+01 1.017e+02 1.197e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-27 19:01:59,538 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479050 2023-11-27 19:02:01,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3193653.3333333335, ans=0.0 2023-11-27 19:02:07,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3193653.3333333335, ans=0.125 2023-11-27 19:02:07,642 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.36 vs. limit=15.0 2023-11-27 19:02:19,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3193720.0, ans=0.125 2023-11-27 19:02:25,422 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:02:29,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3193786.6666666665, ans=0.1 2023-11-27 19:02:33,755 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10150, loss[loss=0.05753, simple_loss=0.07652, pruned_loss=0.009507, audio_tagging_loss=0.009765, over 15619.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08918, pruned_loss=0.01233, audio_tagging_loss=0.008817, over 3038370.83 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:02:33,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3193853.3333333335, ans=0.125 2023-11-27 19:02:56,513 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479100 2023-11-27 19:03:02,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3193986.6666666665, ans=0.125 2023-11-27 19:03:03,410 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:03:25,425 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2023-11-27 19:03:31,557 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10200, loss[loss=0.09139, simple_loss=0.1198, pruned_loss=0.02269, audio_tagging_loss=0.008782, over 14847.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09028, pruned_loss=0.01257, audio_tagging_loss=0.008823, over 3046927.17 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:03:32,634 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.681e+01 9.288e+01 9.961e+01 1.325e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-27 19:03:32,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3194186.6666666665, ans=0.125 2023-11-27 19:03:38,885 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-27 19:03:54,399 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479150 2023-11-27 19:03:57,179 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:04:12,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3194386.6666666665, ans=0.04949747468305833 2023-11-27 19:04:24,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3194453.3333333335, ans=0.0 2023-11-27 19:04:28,884 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10250, loss[loss=0.05642, simple_loss=0.07488, pruned_loss=0.009804, audio_tagging_loss=0.009171, over 15318.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08912, pruned_loss=0.01234, audio_tagging_loss=0.008995, over 3051366.51 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:04:36,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3194520.0, ans=0.125 2023-11-27 19:04:44,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3194586.6666666665, ans=0.1 2023-11-27 19:04:44,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2023-11-27 19:04:49,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3194586.6666666665, ans=0.0 2023-11-27 19:04:52,756 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479200 2023-11-27 19:04:59,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3194653.3333333335, ans=0.125 2023-11-27 19:04:59,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3194653.3333333335, ans=0.0 2023-11-27 19:05:18,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3194786.6666666665, ans=0.2 2023-11-27 19:05:20,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3194786.6666666665, ans=0.125 2023-11-27 19:05:27,508 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10300, loss[loss=0.07134, simple_loss=0.1011, pruned_loss=0.01279, audio_tagging_loss=0.007995, over 15570.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08895, pruned_loss=0.0123, audio_tagging_loss=0.009073, over 3053244.04 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:05:28,536 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.569e+01 8.814e+01 9.491e+01 9.959e+01 1.329e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 19:05:28,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3194853.3333333335, ans=0.1 2023-11-27 19:05:31,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3194853.3333333335, ans=0.0 2023-11-27 19:05:34,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3194853.3333333335, ans=0.0 2023-11-27 19:05:41,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3194920.0, ans=0.125 2023-11-27 19:05:45,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3194920.0, ans=0.125 2023-11-27 19:05:50,043 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479250 2023-11-27 19:06:00,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3195053.3333333335, ans=0.125 2023-11-27 19:06:00,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3195053.3333333335, ans=0.5 2023-11-27 19:06:00,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3195053.3333333335, ans=0.125 2023-11-27 19:06:10,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3195053.3333333335, ans=0.2 2023-11-27 19:06:12,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3195120.0, ans=0.0 2023-11-27 19:06:23,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3195186.6666666665, ans=0.125 2023-11-27 19:06:24,330 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10350, loss[loss=0.06204, simple_loss=0.07053, pruned_loss=0.01214, audio_tagging_loss=0.01463, over 15355.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08943, pruned_loss=0.01239, audio_tagging_loss=0.009111, over 3047381.09 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:06:47,579 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479300 2023-11-27 19:07:21,730 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10400, loss[loss=0.06491, simple_loss=0.09107, pruned_loss=0.01037, audio_tagging_loss=0.009007, over 15238.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09027, pruned_loss=0.01238, audio_tagging_loss=0.009092, over 3048374.67 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:07:24,452 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.346e+01 8.831e+01 9.287e+01 1.004e+02 1.358e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 19:07:41,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=12.0 2023-11-27 19:07:44,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3195653.3333333335, ans=0.125 2023-11-27 19:07:45,877 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479350 2023-11-27 19:08:09,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3195786.6666666665, ans=0.125 2023-11-27 19:08:19,758 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10450, loss[loss=0.05898, simple_loss=0.0736, pruned_loss=0.01229, audio_tagging_loss=0.009896, over 15726.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.08974, pruned_loss=0.0125, audio_tagging_loss=0.009083, over 3041721.08 frames. ], batch size: 62, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:08:22,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3195853.3333333335, ans=0.125 2023-11-27 19:08:30,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3195853.3333333335, ans=0.2 2023-11-27 19:08:34,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3195920.0, ans=0.0 2023-11-27 19:08:43,083 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479400 2023-11-27 19:09:03,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3196053.3333333335, ans=0.125 2023-11-27 19:09:18,581 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10500, loss[loss=0.06226, simple_loss=0.08777, pruned_loss=0.009568, audio_tagging_loss=0.008805, over 15356.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.0892, pruned_loss=0.01245, audio_tagging_loss=0.008997, over 3041906.65 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:09:20,764 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 8.582e+01 9.246e+01 1.004e+02 1.274e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 19:09:21,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3196186.6666666665, ans=0.1 2023-11-27 19:09:25,879 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2023-11-27 19:09:26,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3196186.6666666665, ans=0.125 2023-11-27 19:09:35,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3196253.3333333335, ans=0.2 2023-11-27 19:09:38,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3196253.3333333335, ans=0.0 2023-11-27 19:09:41,856 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479450 2023-11-27 19:09:43,402 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2023-11-27 19:09:48,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3196320.0, ans=0.0 2023-11-27 19:10:01,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3196386.6666666665, ans=0.125 2023-11-27 19:10:02,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3196386.6666666665, ans=0.07 2023-11-27 19:10:05,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3196453.3333333335, ans=0.125 2023-11-27 19:10:06,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3196453.3333333335, ans=0.125 2023-11-27 19:10:16,047 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10550, loss[loss=0.07881, simple_loss=0.1188, pruned_loss=0.01488, audio_tagging_loss=0.00453, over 15001.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09066, pruned_loss=0.01286, audio_tagging_loss=0.008854, over 3043381.55 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:10:24,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3196520.0, ans=0.125 2023-11-27 19:10:39,731 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479500 2023-11-27 19:10:44,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.59 vs. limit=15.0 2023-11-27 19:10:50,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3196720.0, ans=0.5 2023-11-27 19:10:54,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3196720.0, ans=0.125 2023-11-27 19:10:58,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3196720.0, ans=0.125 2023-11-27 19:10:59,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3196720.0, ans=0.0 2023-11-27 19:11:00,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3196786.6666666665, ans=0.125 2023-11-27 19:11:03,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3196786.6666666665, ans=0.07 2023-11-27 19:11:13,702 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10600, loss[loss=0.05801, simple_loss=0.08192, pruned_loss=0.008733, audio_tagging_loss=0.008319, over 15674.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.0898, pruned_loss=0.01271, audio_tagging_loss=0.008929, over 3041010.31 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:11:15,884 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.628e+01 9.441e+01 1.014e+02 1.251e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-27 19:11:36,974 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479550 2023-11-27 19:11:43,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3196986.6666666665, ans=0.0 2023-11-27 19:11:46,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3196986.6666666665, ans=0.0 2023-11-27 19:12:07,739 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2023-11-27 19:12:11,329 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10650, loss[loss=0.06201, simple_loss=0.08788, pruned_loss=0.01071, audio_tagging_loss=0.007354, over 16157.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09012, pruned_loss=0.01274, audio_tagging_loss=0.008775, over 3044856.88 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:12:11,764 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=15.0 2023-11-27 19:12:32,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3197253.3333333335, ans=0.5 2023-11-27 19:12:34,481 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479600 2023-11-27 19:12:57,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3197453.3333333335, ans=0.025 2023-11-27 19:12:59,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3197453.3333333335, ans=0.125 2023-11-27 19:13:08,752 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.20 vs. limit=10.0 2023-11-27 19:13:09,081 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10700, loss[loss=0.094, simple_loss=0.1213, pruned_loss=0.0264, audio_tagging_loss=0.006963, over 14841.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09099, pruned_loss=0.01276, audio_tagging_loss=0.008671, over 3044633.90 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:13:11,197 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.023e+01 8.717e+01 9.252e+01 9.839e+01 1.176e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-27 19:13:16,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3197520.0, ans=0.0 2023-11-27 19:13:32,700 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479650 2023-11-27 19:13:55,691 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=15.0 2023-11-27 19:14:06,955 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10750, loss[loss=0.06168, simple_loss=0.08467, pruned_loss=0.009814, audio_tagging_loss=0.009528, over 15574.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09102, pruned_loss=0.01268, audio_tagging_loss=0.008688, over 3045472.00 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:14:12,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3197853.3333333335, ans=0.0 2023-11-27 19:14:29,522 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479700 2023-11-27 19:14:29,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3197986.6666666665, ans=0.125 2023-11-27 19:14:46,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3198053.3333333335, ans=0.125 2023-11-27 19:14:46,673 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=15.0 2023-11-27 19:14:49,295 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.58 vs. limit=15.0 2023-11-27 19:15:04,570 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10800, loss[loss=0.06333, simple_loss=0.08499, pruned_loss=0.01273, audio_tagging_loss=0.008109, over 15694.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09071, pruned_loss=0.01265, audio_tagging_loss=0.008647, over 3045629.84 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 19:15:06,822 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 8.494e+01 9.274e+01 9.978e+01 1.190e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 19:15:07,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.61 vs. limit=15.0 2023-11-27 19:15:27,739 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479750 2023-11-27 19:15:36,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3198320.0, ans=0.1 2023-11-27 19:16:02,285 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10850, loss[loss=0.06642, simple_loss=0.09821, pruned_loss=0.01052, audio_tagging_loss=0.006794, over 15775.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09028, pruned_loss=0.01254, audio_tagging_loss=0.008685, over 3049787.76 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 19:16:07,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3198520.0, ans=0.0 2023-11-27 19:16:25,215 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479800 2023-11-27 19:16:28,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3198653.3333333335, ans=0.125 2023-11-27 19:16:30,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3198653.3333333335, ans=0.0 2023-11-27 19:16:32,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3198653.3333333335, ans=0.04949747468305833 2023-11-27 19:16:39,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3198720.0, ans=0.125 2023-11-27 19:16:40,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3198720.0, ans=0.125 2023-11-27 19:17:00,004 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10900, loss[loss=0.04569, simple_loss=0.05472, pruned_loss=0.009961, audio_tagging_loss=0.008373, over 14370.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09089, pruned_loss=0.01255, audio_tagging_loss=0.008775, over 3051709.82 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 19:17:00,038 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:17:02,225 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 8.922e+01 9.500e+01 1.014e+02 1.176e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-27 19:17:18,587 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.36 vs. limit=22.5 2023-11-27 19:17:22,635 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479850 2023-11-27 19:17:27,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3198986.6666666665, ans=0.1 2023-11-27 19:17:57,559 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10950, loss[loss=0.0591, simple_loss=0.08408, pruned_loss=0.009699, audio_tagging_loss=0.007363, over 15107.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.0909, pruned_loss=0.01272, audio_tagging_loss=0.008684, over 3048135.36 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:18:02,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3199186.6666666665, ans=0.125 2023-11-27 19:18:06,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3199186.6666666665, ans=0.0 2023-11-27 19:18:09,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3199253.3333333335, ans=0.125 2023-11-27 19:18:15,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3199253.3333333335, ans=0.125 2023-11-27 19:18:15,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3199253.3333333335, ans=0.125 2023-11-27 19:18:20,266 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479900 2023-11-27 19:18:27,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3199320.0, ans=0.1 2023-11-27 19:18:54,493 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11000, loss[loss=0.06682, simple_loss=0.08147, pruned_loss=0.01663, audio_tagging_loss=0.009456, over 14596.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09022, pruned_loss=0.01251, audio_tagging_loss=0.008811, over 3046330.07 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:18:57,745 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.669e+01 9.375e+01 1.024e+02 1.386e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-27 19:19:07,844 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:19:08,311 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2023-11-27 19:19:13,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3199586.6666666665, ans=0.125 2023-11-27 19:19:18,274 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479950 2023-11-27 19:19:36,498 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.66 vs. limit=15.0 2023-11-27 19:19:51,841 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11050, loss[loss=0.04677, simple_loss=0.06028, pruned_loss=0.007424, audio_tagging_loss=0.009207, over 14365.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09022, pruned_loss=0.01243, audio_tagging_loss=0.008866, over 3050166.18 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:19:55,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3199853.3333333335, ans=0.2 2023-11-27 19:20:15,029 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480000 2023-11-27 19:20:16,412 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-480000.pt 2023-11-27 19:20:24,759 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2023-11-27 19:20:25,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3199986.6666666665, ans=0.125 2023-11-27 19:20:28,951 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=15.0 2023-11-27 19:20:38,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3200053.3333333335, ans=0.125 2023-11-27 19:20:49,643 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.50 vs. limit=22.5 2023-11-27 19:20:51,374 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11100, loss[loss=0.05266, simple_loss=0.0721, pruned_loss=0.008157, audio_tagging_loss=0.008458, over 15917.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.08988, pruned_loss=0.01256, audio_tagging_loss=0.008996, over 3050489.03 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:20:51,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3200186.6666666665, ans=0.125 2023-11-27 19:20:56,276 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 8.813e+01 9.363e+01 1.015e+02 1.283e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-27 19:21:07,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3200253.3333333335, ans=0.05 2023-11-27 19:21:13,934 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480050 2023-11-27 19:21:25,372 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.78 vs. limit=10.0 2023-11-27 19:21:28,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3200386.6666666665, ans=22.5 2023-11-27 19:21:49,151 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11150, loss[loss=0.07124, simple_loss=0.09413, pruned_loss=0.01313, audio_tagging_loss=0.01105, over 14780.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08959, pruned_loss=0.01244, audio_tagging_loss=0.009074, over 3046210.69 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:22:07,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3200586.6666666665, ans=0.1 2023-11-27 19:22:10,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3200586.6666666665, ans=0.125 2023-11-27 19:22:10,911 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=12.0 2023-11-27 19:22:11,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3200653.3333333335, ans=0.95 2023-11-27 19:22:12,481 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480100 2023-11-27 19:22:19,685 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.90 vs. limit=22.5 2023-11-27 19:22:35,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3200786.6666666665, ans=0.0 2023-11-27 19:22:40,984 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-11-27 19:22:41,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3200786.6666666665, ans=0.0 2023-11-27 19:22:46,468 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11200, loss[loss=0.07216, simple_loss=0.1019, pruned_loss=0.01163, audio_tagging_loss=0.00959, over 14453.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.08968, pruned_loss=0.01255, audio_tagging_loss=0.00911, over 3050558.31 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:22:46,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3200853.3333333335, ans=0.0 2023-11-27 19:22:51,510 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.792e+01 8.755e+01 9.520e+01 1.002e+02 1.290e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 19:22:51,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3200853.3333333335, ans=0.0 2023-11-27 19:22:54,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3200853.3333333335, ans=0.125 2023-11-27 19:23:10,216 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480150 2023-11-27 19:23:15,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3200986.6666666665, ans=0.125 2023-11-27 19:23:24,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3201053.3333333335, ans=0.09899494936611666 2023-11-27 19:23:24,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3201053.3333333335, ans=0.125 2023-11-27 19:23:24,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3201053.3333333335, ans=0.125 2023-11-27 19:23:41,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3201120.0, ans=0.07 2023-11-27 19:23:44,367 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11250, loss[loss=0.05791, simple_loss=0.0811, pruned_loss=0.007972, audio_tagging_loss=0.009391, over 15278.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08866, pruned_loss=0.01246, audio_tagging_loss=0.009158, over 3048561.22 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:23:49,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3201186.6666666665, ans=0.1 2023-11-27 19:23:57,962 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=22.5 2023-11-27 19:24:07,143 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480200 2023-11-27 19:24:24,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3201386.6666666665, ans=0.0 2023-11-27 19:24:32,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3201453.3333333335, ans=0.0 2023-11-27 19:24:42,377 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11300, loss[loss=0.05065, simple_loss=0.06227, pruned_loss=0.009093, audio_tagging_loss=0.01042, over 15192.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08906, pruned_loss=0.01242, audio_tagging_loss=0.009062, over 3047539.64 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:24:43,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3201520.0, ans=0.125 2023-11-27 19:24:45,075 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.00 vs. limit=22.5 2023-11-27 19:24:47,805 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.989e+01 8.774e+01 9.523e+01 1.010e+02 1.222e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-27 19:24:47,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3201520.0, ans=0.125 2023-11-27 19:25:05,803 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480250 2023-11-27 19:25:06,324 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.76 vs. limit=15.0 2023-11-27 19:25:14,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3201653.3333333335, ans=0.0 2023-11-27 19:25:17,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=3201720.0, ans=6.0 2023-11-27 19:25:21,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3201720.0, ans=0.125 2023-11-27 19:25:22,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3201720.0, ans=0.07 2023-11-27 19:25:32,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3201786.6666666665, ans=0.0 2023-11-27 19:25:39,715 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11350, loss[loss=0.05636, simple_loss=0.08113, pruned_loss=0.007747, audio_tagging_loss=0.00805, over 16412.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08966, pruned_loss=0.01255, audio_tagging_loss=0.008889, over 3044437.75 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:25:40,415 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.69 vs. limit=22.5 2023-11-27 19:25:47,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3201853.3333333335, ans=0.125 2023-11-27 19:25:51,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3201920.0, ans=0.125 2023-11-27 19:26:03,225 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480300 2023-11-27 19:26:15,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.23 vs. limit=22.5 2023-11-27 19:26:37,133 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.50 vs. limit=15.0 2023-11-27 19:26:37,700 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11400, loss[loss=0.05976, simple_loss=0.07927, pruned_loss=0.01195, audio_tagging_loss=0.008174, over 15506.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09001, pruned_loss=0.01267, audio_tagging_loss=0.008772, over 3039785.10 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:26:43,657 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.795e+01 9.431e+01 1.004e+02 1.426e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 19:26:53,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3202253.3333333335, ans=0.0 2023-11-27 19:27:00,044 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480350 2023-11-27 19:27:01,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3202320.0, ans=0.125 2023-11-27 19:27:04,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3202320.0, ans=0.0 2023-11-27 19:27:14,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3202386.6666666665, ans=0.07 2023-11-27 19:27:27,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3202453.3333333335, ans=0.0 2023-11-27 19:27:29,228 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.77 vs. limit=10.0 2023-11-27 19:27:33,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3202453.3333333335, ans=0.125 2023-11-27 19:27:34,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3202520.0, ans=0.125 2023-11-27 19:27:35,261 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11450, loss[loss=0.05774, simple_loss=0.08472, pruned_loss=0.005508, audio_tagging_loss=0.00987, over 15538.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09106, pruned_loss=0.01272, audio_tagging_loss=0.008577, over 3045565.83 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:27:55,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3202586.6666666665, ans=0.0 2023-11-27 19:27:56,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3202653.3333333335, ans=0.125 2023-11-27 19:27:57,796 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480400 2023-11-27 19:28:01,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3202653.3333333335, ans=0.125 2023-11-27 19:28:16,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3202720.0, ans=0.0 2023-11-27 19:28:19,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3202720.0, ans=0.125 2023-11-27 19:28:32,567 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11500, loss[loss=0.05769, simple_loss=0.08098, pruned_loss=0.008226, audio_tagging_loss=0.008972, over 14558.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09091, pruned_loss=0.01253, audio_tagging_loss=0.008613, over 3043042.18 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:28:32,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3202853.3333333335, ans=0.1 2023-11-27 19:28:38,531 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 9.056e+01 9.508e+01 1.039e+02 1.307e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 19:28:38,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3202853.3333333335, ans=0.125 2023-11-27 19:28:51,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3202920.0, ans=0.125 2023-11-27 19:28:56,705 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480450 2023-11-27 19:29:23,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3203120.0, ans=0.125 2023-11-27 19:29:24,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3203120.0, ans=0.125 2023-11-27 19:29:25,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3203120.0, ans=0.125 2023-11-27 19:29:30,435 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11550, loss[loss=0.0745, simple_loss=0.09946, pruned_loss=0.01799, audio_tagging_loss=0.006787, over 15800.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08998, pruned_loss=0.01246, audio_tagging_loss=0.008695, over 3036423.83 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:29:32,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3203186.6666666665, ans=0.0 2023-11-27 19:29:51,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3203253.3333333335, ans=10.0 2023-11-27 19:29:53,583 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480500 2023-11-27 19:29:54,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3203320.0, ans=0.07 2023-11-27 19:30:08,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3203386.6666666665, ans=0.125 2023-11-27 19:30:09,436 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:30:12,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3203386.6666666665, ans=0.0 2023-11-27 19:30:13,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3203386.6666666665, ans=0.0 2023-11-27 19:30:18,845 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:30:20,932 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.47 vs. limit=15.0 2023-11-27 19:30:28,602 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11600, loss[loss=0.06215, simple_loss=0.08807, pruned_loss=0.009788, audio_tagging_loss=0.008322, over 14472.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09065, pruned_loss=0.01267, audio_tagging_loss=0.008718, over 3036180.95 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:30:33,946 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.753e+01 9.625e+01 1.023e+02 1.677e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-27 19:30:36,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3203520.0, ans=0.2 2023-11-27 19:30:40,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3203586.6666666665, ans=0.0 2023-11-27 19:30:50,952 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480550 2023-11-27 19:31:00,765 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.90 vs. limit=15.0 2023-11-27 19:31:10,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3203720.0, ans=0.125 2023-11-27 19:31:24,677 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11650, loss[loss=0.06244, simple_loss=0.08291, pruned_loss=0.01297, audio_tagging_loss=0.008017, over 15474.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08996, pruned_loss=0.01243, audio_tagging_loss=0.008759, over 3040958.88 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:31:33,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3203853.3333333335, ans=0.09899494936611666 2023-11-27 19:31:39,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3203920.0, ans=0.125 2023-11-27 19:31:47,491 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480600 2023-11-27 19:31:58,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3203986.6666666665, ans=0.1 2023-11-27 19:32:04,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3204053.3333333335, ans=0.0 2023-11-27 19:32:20,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3204120.0, ans=0.05 2023-11-27 19:32:21,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3204186.6666666665, ans=0.1 2023-11-27 19:32:22,261 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11700, loss[loss=0.04694, simple_loss=0.06211, pruned_loss=0.004857, audio_tagging_loss=0.01103, over 15455.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09, pruned_loss=0.01245, audio_tagging_loss=0.008732, over 3041990.64 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:32:24,380 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.54 vs. limit=15.0 2023-11-27 19:32:27,669 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.22 vs. limit=15.0 2023-11-27 19:32:28,153 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.912e+01 8.724e+01 9.258e+01 1.003e+02 1.518e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-27 19:32:45,842 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480650 2023-11-27 19:32:55,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3204386.6666666665, ans=0.1 2023-11-27 19:33:20,260 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11750, loss[loss=0.07614, simple_loss=0.1027, pruned_loss=0.01592, audio_tagging_loss=0.008874, over 15631.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09075, pruned_loss=0.01253, audio_tagging_loss=0.00871, over 3041300.34 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:33:41,872 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.05 vs. limit=15.0 2023-11-27 19:33:43,533 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480700 2023-11-27 19:34:18,053 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11800, loss[loss=0.05442, simple_loss=0.06786, pruned_loss=0.01061, audio_tagging_loss=0.009873, over 15659.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09041, pruned_loss=0.0125, audio_tagging_loss=0.008768, over 3047841.21 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:34:21,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3204853.3333333335, ans=0.0 2023-11-27 19:34:23,475 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.512e+01 9.140e+01 9.806e+01 1.375e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-27 19:34:23,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3204853.3333333335, ans=0.2 2023-11-27 19:34:27,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3204853.3333333335, ans=0.0 2023-11-27 19:34:37,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3204920.0, ans=10.0 2023-11-27 19:34:40,939 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480750 2023-11-27 19:34:41,181 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:34:46,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3204986.6666666665, ans=0.0 2023-11-27 19:34:49,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=22.5 2023-11-27 19:35:00,899 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:35:10,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3205120.0, ans=0.125 2023-11-27 19:35:15,451 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11850, loss[loss=0.07067, simple_loss=0.09441, pruned_loss=0.01385, audio_tagging_loss=0.009618, over 15699.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09029, pruned_loss=0.01244, audio_tagging_loss=0.008838, over 3049895.81 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:35:21,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3205186.6666666665, ans=0.125 2023-11-27 19:35:34,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3205253.3333333335, ans=0.125 2023-11-27 19:35:39,086 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480800 2023-11-27 19:35:44,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3205320.0, ans=0.2 2023-11-27 19:36:13,851 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11900, loss[loss=0.06375, simple_loss=0.08173, pruned_loss=0.01174, audio_tagging_loss=0.01114, over 14733.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.0903, pruned_loss=0.01235, audio_tagging_loss=0.008898, over 3045850.83 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:36:19,244 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.961e+01 9.645e+01 1.029e+02 1.669e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-27 19:36:30,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3205586.6666666665, ans=0.95 2023-11-27 19:36:32,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3205586.6666666665, ans=0.125 2023-11-27 19:36:37,024 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480850 2023-11-27 19:36:38,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3205653.3333333335, ans=0.125 2023-11-27 19:36:41,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.37 vs. limit=15.0 2023-11-27 19:36:51,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3205720.0, ans=0.2 2023-11-27 19:37:03,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3205786.6666666665, ans=0.1 2023-11-27 19:37:03,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3205786.6666666665, ans=0.125 2023-11-27 19:37:05,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3205786.6666666665, ans=0.1 2023-11-27 19:37:11,381 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11950, loss[loss=0.07939, simple_loss=0.115, pruned_loss=0.01319, audio_tagging_loss=0.008716, over 16135.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09069, pruned_loss=0.01244, audio_tagging_loss=0.008952, over 3042729.74 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:37:16,576 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.39 vs. limit=22.5 2023-11-27 19:37:16,647 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.93 vs. limit=10.0 2023-11-27 19:37:34,403 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480900 2023-11-27 19:37:36,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3205986.6666666665, ans=0.0 2023-11-27 19:38:05,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3206120.0, ans=0.125 2023-11-27 19:38:07,392 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 12000, loss[loss=0.06234, simple_loss=0.07832, pruned_loss=0.01113, audio_tagging_loss=0.01204, over 15102.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08992, pruned_loss=0.01228, audio_tagging_loss=0.009106, over 3049018.45 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 19:38:07,395 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 19:38:41,936 INFO [train_asr.py:1267] (0/4) Epoch 40, validation: loss=0.05781, simple_loss=0.05069, pruned_loss=0.005234, audio_tagging_loss=0.02723, over 4681554.00 frames. 2023-11-27 19:38:41,936 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 19:38:44,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3206186.6666666665, ans=0.0 2023-11-27 19:38:45,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3206186.6666666665, ans=0.1 2023-11-27 19:38:47,256 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.817e+01 8.885e+01 9.490e+01 1.034e+02 1.237e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 19:38:52,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3206253.3333333335, ans=0.0 2023-11-27 19:39:02,873 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480950 2023-11-27 19:39:10,572 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-40.pt 2023-11-27 19:39:26,325 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 0, loss[loss=0.07173, simple_loss=0.08525, pruned_loss=0.00713, audio_tagging_loss=0.02197, over 14985.00 frames. ], tot_loss[loss=0.07173, simple_loss=0.08525, pruned_loss=0.00713, audio_tagging_loss=0.02197, over 14985.00 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 19:39:26,330 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 19:40:00,218 INFO [train_asr.py:1267] (0/4) Epoch 41, validation: loss=0.05782, simple_loss=0.05064, pruned_loss=0.005197, audio_tagging_loss=0.0273, over 4681554.00 frames. 2023-11-27 19:40:00,219 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 19:40:18,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3206426.6666666665, ans=0.125 2023-11-27 19:40:22,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3206493.3333333335, ans=0.125 2023-11-27 19:40:35,858 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:40:50,943 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481000 2023-11-27 19:40:57,202 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2023-11-27 19:40:57,793 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 50, loss[loss=0.06519, simple_loss=0.07058, pruned_loss=0.01246, audio_tagging_loss=0.01744, over 14763.00 frames. ], tot_loss[loss=0.07421, simple_loss=0.08941, pruned_loss=0.01228, audio_tagging_loss=0.01723, over 683992.78 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 19:41:05,232 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:41:13,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3206760.0, ans=0.125 2023-11-27 19:41:29,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3206826.6666666665, ans=0.0 2023-11-27 19:41:31,553 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.740e+01 9.368e+01 1.003e+02 1.103e+02 1.548e+02, threshold=2.006e+02, percent-clipped=0.0 2023-11-27 19:41:33,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3206893.3333333335, ans=0.125 2023-11-27 19:41:36,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3206893.3333333335, ans=0.0 2023-11-27 19:41:38,717 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=12.0 2023-11-27 19:41:48,707 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481050 2023-11-27 19:41:55,742 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 100, loss[loss=0.06504, simple_loss=0.08169, pruned_loss=0.00937, audio_tagging_loss=0.01482, over 15581.00 frames. ], tot_loss[loss=0.07205, simple_loss=0.08763, pruned_loss=0.01181, audio_tagging_loss=0.01642, over 1205098.42 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 19:42:00,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3207026.6666666665, ans=0.2 2023-11-27 19:42:01,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3207026.6666666665, ans=0.0 2023-11-27 19:42:06,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3207093.3333333335, ans=0.1 2023-11-27 19:42:07,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3207093.3333333335, ans=0.1 2023-11-27 19:42:36,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3207226.6666666665, ans=0.1 2023-11-27 19:42:38,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3207226.6666666665, ans=0.125 2023-11-27 19:42:46,589 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481100 2023-11-27 19:42:49,497 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:42:53,754 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 150, loss[loss=0.06255, simple_loss=0.07726, pruned_loss=0.01134, audio_tagging_loss=0.01258, over 15056.00 frames. ], tot_loss[loss=0.0704, simple_loss=0.08804, pruned_loss=0.01175, audio_tagging_loss=0.01463, over 1612504.94 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:42:57,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3207360.0, ans=0.125 2023-11-27 19:43:12,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3207426.6666666665, ans=0.0 2023-11-27 19:43:28,183 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.150e+01 9.036e+01 9.587e+01 1.014e+02 1.345e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 19:43:35,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3207560.0, ans=0.05 2023-11-27 19:43:44,395 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481150 2023-11-27 19:43:48,688 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-27 19:43:51,441 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 200, loss[loss=0.08621, simple_loss=0.1088, pruned_loss=0.02451, audio_tagging_loss=0.007311, over 16130.00 frames. ], tot_loss[loss=0.06968, simple_loss=0.08953, pruned_loss=0.01206, audio_tagging_loss=0.01285, over 1929350.75 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:43:56,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3207693.3333333335, ans=0.0 2023-11-27 19:43:59,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3207693.3333333335, ans=0.05 2023-11-27 19:44:01,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3207693.3333333335, ans=15.0 2023-11-27 19:44:09,365 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.60 vs. limit=22.5 2023-11-27 19:44:17,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3207826.6666666665, ans=0.125 2023-11-27 19:44:29,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3207893.3333333335, ans=0.1 2023-11-27 19:44:42,435 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481200 2023-11-27 19:44:42,851 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=22.5 2023-11-27 19:44:49,820 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 250, loss[loss=0.07157, simple_loss=0.103, pruned_loss=0.01135, audio_tagging_loss=0.008725, over 14857.00 frames. ], tot_loss[loss=0.06945, simple_loss=0.09106, pruned_loss=0.01244, audio_tagging_loss=0.01148, over 2176538.09 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:45:09,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3208093.3333333335, ans=0.125 2023-11-27 19:45:19,850 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.23 vs. limit=10.0 2023-11-27 19:45:23,711 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 9.216e+01 9.866e+01 1.064e+02 1.717e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-27 19:45:28,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3208226.6666666665, ans=0.0 2023-11-27 19:45:40,408 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481250 2023-11-27 19:45:44,366 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.55 vs. limit=15.0 2023-11-27 19:45:47,466 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 300, loss[loss=0.05694, simple_loss=0.06745, pruned_loss=0.01161, audio_tagging_loss=0.01161, over 14697.00 frames. ], tot_loss[loss=0.06898, simple_loss=0.09097, pruned_loss=0.01274, audio_tagging_loss=0.01076, over 2376256.48 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:46:22,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3208560.0, ans=0.125 2023-11-27 19:46:26,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3208560.0, ans=0.2 2023-11-27 19:46:38,139 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481300 2023-11-27 19:46:44,690 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 350, loss[loss=0.06542, simple_loss=0.09012, pruned_loss=0.01278, audio_tagging_loss=0.007584, over 15076.00 frames. ], tot_loss[loss=0.06905, simple_loss=0.09191, pruned_loss=0.01291, audio_tagging_loss=0.01019, over 2525159.41 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:46:54,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3208693.3333333335, ans=0.125 2023-11-27 19:46:57,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3208760.0, ans=0.125 2023-11-27 19:47:10,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3208826.6666666665, ans=0.125 2023-11-27 19:47:15,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3208826.6666666665, ans=0.125 2023-11-27 19:47:16,892 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.50 vs. limit=22.5 2023-11-27 19:47:17,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3208826.6666666665, ans=0.0 2023-11-27 19:47:19,469 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.707e+01 9.326e+01 9.986e+01 1.163e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 19:47:24,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3208893.3333333335, ans=0.125 2023-11-27 19:47:35,522 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481350 2023-11-27 19:47:43,120 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 400, loss[loss=0.05667, simple_loss=0.0745, pruned_loss=0.01282, audio_tagging_loss=0.006598, over 14578.00 frames. ], tot_loss[loss=0.06809, simple_loss=0.09072, pruned_loss=0.01279, audio_tagging_loss=0.009944, over 2638036.85 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 19:47:43,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3209026.6666666665, ans=0.09899494936611666 2023-11-27 19:47:44,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3209026.6666666665, ans=0.0 2023-11-27 19:48:03,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3209093.3333333335, ans=0.0 2023-11-27 19:48:04,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3209160.0, ans=0.125 2023-11-27 19:48:28,275 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.26 vs. limit=10.0 2023-11-27 19:48:33,318 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481400 2023-11-27 19:48:34,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3209293.3333333335, ans=0.125 2023-11-27 19:48:40,637 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 450, loss[loss=0.06284, simple_loss=0.08087, pruned_loss=0.01221, audio_tagging_loss=0.0102, over 14475.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.09181, pruned_loss=0.0128, audio_tagging_loss=0.009591, over 2723504.25 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:49:02,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3209493.3333333335, ans=10.0 2023-11-27 19:49:16,486 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 8.566e+01 9.069e+01 9.742e+01 1.634e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-27 19:49:16,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3209560.0, ans=0.0 2023-11-27 19:49:18,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3209560.0, ans=0.1 2023-11-27 19:49:31,317 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481450 2023-11-27 19:49:37,846 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 500, loss[loss=0.07381, simple_loss=0.1092, pruned_loss=0.0131, audio_tagging_loss=0.006108, over 15882.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09087, pruned_loss=0.01268, audio_tagging_loss=0.009334, over 2798745.46 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:49:39,618 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.02 vs. limit=15.0 2023-11-27 19:49:42,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3209693.3333333335, ans=0.1 2023-11-27 19:49:47,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3209693.3333333335, ans=0.125 2023-11-27 19:49:53,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3209760.0, ans=0.125 2023-11-27 19:49:57,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3209760.0, ans=0.0 2023-11-27 19:50:05,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3209826.6666666665, ans=0.125 2023-11-27 19:50:09,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3209826.6666666665, ans=0.125 2023-11-27 19:50:10,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.17 vs. limit=10.0 2023-11-27 19:50:13,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3209893.3333333335, ans=0.125 2023-11-27 19:50:24,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3209960.0, ans=0.05 2023-11-27 19:50:28,331 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481500 2023-11-27 19:50:36,102 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 550, loss[loss=0.05206, simple_loss=0.06803, pruned_loss=0.008885, audio_tagging_loss=0.009161, over 15225.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09097, pruned_loss=0.01273, audio_tagging_loss=0.009126, over 2858838.50 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:50:36,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3210026.6666666665, ans=0.0 2023-11-27 19:51:00,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3210160.0, ans=0.125 2023-11-27 19:51:11,649 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.776e+01 9.400e+01 1.030e+02 1.375e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-27 19:51:26,406 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2023-11-27 19:51:27,024 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481550 2023-11-27 19:51:33,516 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 600, loss[loss=0.0687, simple_loss=0.09541, pruned_loss=0.01271, audio_tagging_loss=0.008279, over 17208.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.09123, pruned_loss=0.01276, audio_tagging_loss=0.008918, over 2896626.57 frames. ], batch size: 67, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:51:38,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3210360.0, ans=0.125 2023-11-27 19:51:45,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3210426.6666666665, ans=0.125 2023-11-27 19:51:45,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3210426.6666666665, ans=0.125 2023-11-27 19:51:56,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3210493.3333333335, ans=0.1 2023-11-27 19:52:01,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3210493.3333333335, ans=0.125 2023-11-27 19:52:25,184 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481600 2023-11-27 19:52:32,056 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 650, loss[loss=0.06872, simple_loss=0.08274, pruned_loss=0.01832, audio_tagging_loss=0.009038, over 13968.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.0913, pruned_loss=0.0129, audio_tagging_loss=0.008875, over 2929405.89 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:52:45,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2023-11-27 19:53:09,341 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 8.855e+01 9.490e+01 1.019e+02 1.294e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 19:53:21,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3210960.0, ans=0.125 2023-11-27 19:53:22,848 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481650 2023-11-27 19:53:25,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3210960.0, ans=0.2 2023-11-27 19:53:29,910 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 700, loss[loss=0.05668, simple_loss=0.07556, pruned_loss=0.01064, audio_tagging_loss=0.008257, over 14783.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09027, pruned_loss=0.01272, audio_tagging_loss=0.008946, over 2962207.73 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 8.0 2023-11-27 19:53:32,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3211026.6666666665, ans=0.125 2023-11-27 19:53:36,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3211026.6666666665, ans=0.125 2023-11-27 19:53:48,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3211093.3333333335, ans=0.1 2023-11-27 19:54:01,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3211160.0, ans=0.125 2023-11-27 19:54:20,576 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481700 2023-11-27 19:54:27,678 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 750, loss[loss=0.06427, simple_loss=0.08766, pruned_loss=0.01009, audio_tagging_loss=0.01036, over 14946.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08982, pruned_loss=0.01246, audio_tagging_loss=0.008951, over 2980850.88 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 8.0 2023-11-27 19:54:43,015 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.41 vs. limit=22.5 2023-11-27 19:54:48,664 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.62 vs. limit=22.5 2023-11-27 19:54:51,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3211493.3333333335, ans=0.0 2023-11-27 19:55:03,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3211560.0, ans=0.125 2023-11-27 19:55:03,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3211560.0, ans=0.0 2023-11-27 19:55:04,491 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.938e+01 9.552e+01 1.040e+02 1.357e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-27 19:55:09,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3211560.0, ans=0.09899494936611666 2023-11-27 19:55:10,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3211560.0, ans=0.125 2023-11-27 19:55:19,156 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481750 2023-11-27 19:55:25,749 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 800, loss[loss=0.07655, simple_loss=0.09934, pruned_loss=0.01767, audio_tagging_loss=0.009219, over 15440.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09047, pruned_loss=0.01249, audio_tagging_loss=0.00897, over 2995048.40 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:55:41,671 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:56:12,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3211960.0, ans=0.05 2023-11-27 19:56:16,542 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481800 2023-11-27 19:56:23,306 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 850, loss[loss=0.0483, simple_loss=0.06483, pruned_loss=0.005382, audio_tagging_loss=0.0105, over 14427.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09066, pruned_loss=0.01235, audio_tagging_loss=0.009029, over 3010051.26 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:56:23,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3212026.6666666665, ans=0.1 2023-11-27 19:56:31,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3212026.6666666665, ans=0.1 2023-11-27 19:56:35,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3212093.3333333335, ans=0.04949747468305833 2023-11-27 19:56:43,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3212093.3333333335, ans=0.125 2023-11-27 19:56:59,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3212226.6666666665, ans=0.0 2023-11-27 19:57:00,640 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 8.782e+01 9.230e+01 1.009e+02 1.508e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-27 19:57:07,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3212226.6666666665, ans=0.125 2023-11-27 19:57:07,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3212226.6666666665, ans=0.1 2023-11-27 19:57:14,863 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481850 2023-11-27 19:57:17,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3212293.3333333335, ans=0.0 2023-11-27 19:57:21,341 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 900, loss[loss=0.05649, simple_loss=0.07376, pruned_loss=0.008571, audio_tagging_loss=0.01104, over 14834.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09008, pruned_loss=0.01232, audio_tagging_loss=0.009059, over 3021030.72 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:57:35,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3212426.6666666665, ans=0.125 2023-11-27 19:57:49,514 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.15 vs. limit=12.0 2023-11-27 19:58:08,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3212626.6666666665, ans=0.1 2023-11-27 19:58:12,822 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481900 2023-11-27 19:58:19,366 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 950, loss[loss=0.06582, simple_loss=0.09438, pruned_loss=0.01021, audio_tagging_loss=0.008421, over 14871.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09047, pruned_loss=0.01245, audio_tagging_loss=0.00893, over 3032880.14 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:58:23,301 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=15.0 2023-11-27 19:58:26,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3212693.3333333335, ans=0.0 2023-11-27 19:58:47,453 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2023-11-27 19:58:56,311 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 9.001e+01 9.830e+01 1.057e+02 1.367e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-27 19:59:10,369 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481950 2023-11-27 19:59:12,075 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.84 vs. limit=15.0 2023-11-27 19:59:12,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3212960.0, ans=0.125 2023-11-27 19:59:16,915 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1000, loss[loss=0.07364, simple_loss=0.09865, pruned_loss=0.01399, audio_tagging_loss=0.01033, over 13960.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.0903, pruned_loss=0.01235, audio_tagging_loss=0.008847, over 3031387.51 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:59:25,655 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.68 vs. limit=15.0 2023-11-27 19:59:44,562 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:00:02,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3213293.3333333335, ans=0.125 2023-11-27 20:00:06,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3213293.3333333335, ans=0.125 2023-11-27 20:00:08,228 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482000 2023-11-27 20:00:08,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3213293.3333333335, ans=0.125 2023-11-27 20:00:10,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3213293.3333333335, ans=0.125 2023-11-27 20:00:15,083 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1050, loss[loss=0.09147, simple_loss=0.1269, pruned_loss=0.0228, audio_tagging_loss=0.005204, over 15561.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09084, pruned_loss=0.01252, audio_tagging_loss=0.008669, over 3037836.29 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:00:46,552 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.44 vs. limit=15.0 2023-11-27 20:00:50,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3213560.0, ans=0.2 2023-11-27 20:00:51,872 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.496e+01 9.150e+01 1.002e+02 1.300e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-27 20:00:52,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3213560.0, ans=0.0 2023-11-27 20:01:04,292 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.22 vs. limit=22.5 2023-11-27 20:01:05,898 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482050 2023-11-27 20:01:05,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3213626.6666666665, ans=0.125 2023-11-27 20:01:13,677 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1100, loss[loss=0.06429, simple_loss=0.09643, pruned_loss=0.01113, audio_tagging_loss=0.004939, over 14939.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09057, pruned_loss=0.01256, audio_tagging_loss=0.008638, over 3038829.39 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:01:18,150 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:01:23,140 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2023-11-27 20:01:48,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3213893.3333333335, ans=0.125 2023-11-27 20:01:48,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3213893.3333333335, ans=0.125 2023-11-27 20:02:04,202 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482100 2023-11-27 20:02:10,943 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1150, loss[loss=0.04886, simple_loss=0.0666, pruned_loss=0.00734, audio_tagging_loss=0.00822, over 15566.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09018, pruned_loss=0.01256, audio_tagging_loss=0.008702, over 3038151.46 frames. ], batch size: 60, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:02:26,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3214093.3333333335, ans=0.0 2023-11-27 20:02:32,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3214093.3333333335, ans=0.125 2023-11-27 20:02:34,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3214160.0, ans=0.0 2023-11-27 20:02:37,647 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.59 vs. limit=10.0 2023-11-27 20:02:48,773 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.691e+01 8.534e+01 9.220e+01 9.959e+01 1.599e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-27 20:02:53,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3214226.6666666665, ans=0.1 2023-11-27 20:02:59,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3214293.3333333335, ans=0.0 2023-11-27 20:03:02,495 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482150 2023-11-27 20:03:09,505 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1200, loss[loss=0.06154, simple_loss=0.07761, pruned_loss=0.01271, audio_tagging_loss=0.01002, over 14752.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08997, pruned_loss=0.01243, audio_tagging_loss=0.008648, over 3034909.97 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:03:10,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=15.0 2023-11-27 20:03:16,763 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2023-11-27 20:03:17,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3214360.0, ans=0.0 2023-11-27 20:03:21,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3214426.6666666665, ans=0.125 2023-11-27 20:03:24,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3214426.6666666665, ans=0.125 2023-11-27 20:03:25,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3214426.6666666665, ans=0.125 2023-11-27 20:03:34,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3214493.3333333335, ans=0.2 2023-11-27 20:03:41,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3214493.3333333335, ans=0.0 2023-11-27 20:04:00,220 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482200 2023-11-27 20:04:00,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3214626.6666666665, ans=0.125 2023-11-27 20:04:07,631 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1250, loss[loss=0.0597, simple_loss=0.08435, pruned_loss=0.008661, audio_tagging_loss=0.008858, over 15431.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08982, pruned_loss=0.01234, audio_tagging_loss=0.00867, over 3034240.85 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:04:44,178 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.423e+01 8.597e+01 9.538e+01 1.019e+02 1.522e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 20:04:58,114 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482250 2023-11-27 20:05:03,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3214960.0, ans=0.125 2023-11-27 20:05:05,194 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1300, loss[loss=0.07398, simple_loss=0.1091, pruned_loss=0.009934, audio_tagging_loss=0.009475, over 13877.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09026, pruned_loss=0.01244, audio_tagging_loss=0.008596, over 3032510.41 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:05:14,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3215026.6666666665, ans=0.125 2023-11-27 20:05:25,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3215093.3333333335, ans=0.125 2023-11-27 20:05:27,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3215160.0, ans=0.125 2023-11-27 20:05:27,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3215160.0, ans=0.125 2023-11-27 20:05:30,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3215160.0, ans=0.0 2023-11-27 20:05:37,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3215160.0, ans=0.0 2023-11-27 20:05:42,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3215226.6666666665, ans=0.2 2023-11-27 20:05:54,007 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.67 vs. limit=15.0 2023-11-27 20:05:55,767 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482300 2023-11-27 20:05:56,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3215293.3333333335, ans=0.2 2023-11-27 20:06:03,058 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1350, loss[loss=0.06069, simple_loss=0.07987, pruned_loss=0.01144, audio_tagging_loss=0.009321, over 14897.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09006, pruned_loss=0.01242, audio_tagging_loss=0.00868, over 3039074.23 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:06:23,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3215426.6666666665, ans=0.125 2023-11-27 20:06:27,041 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:06:40,515 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 8.606e+01 9.244e+01 9.716e+01 1.166e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 20:06:41,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3215560.0, ans=15.0 2023-11-27 20:06:47,032 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:06:53,645 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482350 2023-11-27 20:06:55,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3215626.6666666665, ans=0.125 2023-11-27 20:06:55,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3215626.6666666665, ans=0.125 2023-11-27 20:07:00,782 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1400, loss[loss=0.07727, simple_loss=0.09204, pruned_loss=0.02134, audio_tagging_loss=0.00991, over 14760.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08994, pruned_loss=0.0125, audio_tagging_loss=0.008717, over 3041457.09 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:07:15,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3215760.0, ans=0.125 2023-11-27 20:07:33,626 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2023-11-27 20:07:34,438 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:07:44,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3215893.3333333335, ans=0.04949747468305833 2023-11-27 20:07:45,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3215893.3333333335, ans=0.125 2023-11-27 20:07:51,646 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482400 2023-11-27 20:07:58,327 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1450, loss[loss=0.06167, simple_loss=0.08491, pruned_loss=0.007688, audio_tagging_loss=0.01153, over 14296.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08887, pruned_loss=0.01216, audio_tagging_loss=0.008755, over 3039558.19 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:08:05,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3216026.6666666665, ans=0.125 2023-11-27 20:08:25,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3216160.0, ans=0.0 2023-11-27 20:08:27,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3216160.0, ans=0.0 2023-11-27 20:08:36,399 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.289e+01 8.746e+01 9.275e+01 1.017e+02 1.401e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 20:08:43,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3216293.3333333335, ans=0.2 2023-11-27 20:08:49,289 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482450 2023-11-27 20:08:51,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3216293.3333333335, ans=0.125 2023-11-27 20:08:56,206 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1500, loss[loss=0.06431, simple_loss=0.08331, pruned_loss=0.008658, audio_tagging_loss=0.01399, over 15328.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08964, pruned_loss=0.01238, audio_tagging_loss=0.008832, over 3037892.97 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:09:01,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3216360.0, ans=0.95 2023-11-27 20:09:13,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3216426.6666666665, ans=0.125 2023-11-27 20:09:25,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3216493.3333333335, ans=0.125 2023-11-27 20:09:25,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3216493.3333333335, ans=0.1 2023-11-27 20:09:29,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3216560.0, ans=0.125 2023-11-27 20:09:34,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3216560.0, ans=0.125 2023-11-27 20:09:42,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3216626.6666666665, ans=0.125 2023-11-27 20:09:45,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3216626.6666666665, ans=0.125 2023-11-27 20:09:47,305 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482500 2023-11-27 20:09:53,908 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1550, loss[loss=0.0693, simple_loss=0.09235, pruned_loss=0.01373, audio_tagging_loss=0.009392, over 14720.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.0896, pruned_loss=0.01256, audio_tagging_loss=0.009037, over 3031449.83 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:09:57,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3216693.3333333335, ans=0.125 2023-11-27 20:10:03,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3216693.3333333335, ans=0.1 2023-11-27 20:10:17,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3216826.6666666665, ans=0.125 2023-11-27 20:10:32,571 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 8.858e+01 9.389e+01 9.907e+01 1.182e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-27 20:10:45,466 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482550 2023-11-27 20:10:45,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3216960.0, ans=0.125 2023-11-27 20:10:51,984 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1600, loss[loss=0.06599, simple_loss=0.0926, pruned_loss=0.01043, audio_tagging_loss=0.009251, over 15293.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.08985, pruned_loss=0.01256, audio_tagging_loss=0.009047, over 3039655.21 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:11:08,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3217093.3333333335, ans=0.0 2023-11-27 20:11:17,230 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.00 vs. limit=15.0 2023-11-27 20:11:26,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3217226.6666666665, ans=0.0 2023-11-27 20:11:27,674 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=22.5 2023-11-27 20:11:42,465 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482600 2023-11-27 20:11:49,951 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1650, loss[loss=0.06882, simple_loss=0.09269, pruned_loss=0.01082, audio_tagging_loss=0.01166, over 17202.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09004, pruned_loss=0.0125, audio_tagging_loss=0.008999, over 3046661.56 frames. ], batch size: 62, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:12:07,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.60 vs. limit=22.5 2023-11-27 20:12:09,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3217426.6666666665, ans=0.125 2023-11-27 20:12:27,444 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.730e+01 9.445e+01 1.002e+02 1.391e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-27 20:12:36,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3217626.6666666665, ans=0.125 2023-11-27 20:12:38,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3217626.6666666665, ans=0.2 2023-11-27 20:12:40,728 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482650 2023-11-27 20:12:47,281 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1700, loss[loss=0.06542, simple_loss=0.09297, pruned_loss=0.009143, audio_tagging_loss=0.009794, over 15871.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09107, pruned_loss=0.01273, audio_tagging_loss=0.009004, over 3049314.28 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:12:52,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3217693.3333333335, ans=0.1 2023-11-27 20:13:05,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3217760.0, ans=0.1 2023-11-27 20:13:22,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3217893.3333333335, ans=0.0 2023-11-27 20:13:29,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3217893.3333333335, ans=0.1 2023-11-27 20:13:36,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3217960.0, ans=0.035 2023-11-27 20:13:36,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3217960.0, ans=0.0 2023-11-27 20:13:38,651 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482700 2023-11-27 20:13:45,083 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1750, loss[loss=0.03943, simple_loss=0.05342, pruned_loss=0.004084, audio_tagging_loss=0.00863, over 13771.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09056, pruned_loss=0.01247, audio_tagging_loss=0.008944, over 3050130.00 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:14:08,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3218160.0, ans=0.04949747468305833 2023-11-27 20:14:23,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=3218226.6666666665, ans=22.5 2023-11-27 20:14:23,551 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.743e+01 9.232e+01 9.959e+01 1.189e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-27 20:14:33,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3218293.3333333335, ans=0.04949747468305833 2023-11-27 20:14:35,682 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482750 2023-11-27 20:14:42,295 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1800, loss[loss=0.05934, simple_loss=0.07845, pruned_loss=0.01001, audio_tagging_loss=0.01011, over 15857.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09025, pruned_loss=0.01241, audio_tagging_loss=0.00888, over 3052304.47 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:14:58,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3218426.6666666665, ans=0.0 2023-11-27 20:14:58,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3218426.6666666665, ans=0.125 2023-11-27 20:15:09,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3218493.3333333335, ans=0.2 2023-11-27 20:15:17,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3218560.0, ans=0.125 2023-11-27 20:15:32,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3218626.6666666665, ans=0.1 2023-11-27 20:15:33,392 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482800 2023-11-27 20:15:34,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3218626.6666666665, ans=0.95 2023-11-27 20:15:36,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3218626.6666666665, ans=0.0 2023-11-27 20:15:38,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3218626.6666666665, ans=0.0 2023-11-27 20:15:40,754 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1850, loss[loss=0.07284, simple_loss=0.1084, pruned_loss=0.01135, audio_tagging_loss=0.007284, over 15528.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08921, pruned_loss=0.01227, audio_tagging_loss=0.008906, over 3044961.08 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:15:51,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3218760.0, ans=0.125 2023-11-27 20:15:54,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3218760.0, ans=0.0 2023-11-27 20:16:11,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=3218826.6666666665, ans=0.1 2023-11-27 20:16:15,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3218893.3333333335, ans=0.125 2023-11-27 20:16:18,491 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.712e+01 9.397e+01 9.825e+01 1.168e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 20:16:24,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3218893.3333333335, ans=0.2 2023-11-27 20:16:28,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3218960.0, ans=0.0 2023-11-27 20:16:29,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3218960.0, ans=0.0 2023-11-27 20:16:31,982 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482850 2023-11-27 20:16:38,531 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1900, loss[loss=0.066, simple_loss=0.09124, pruned_loss=0.01322, audio_tagging_loss=0.007165, over 15488.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08917, pruned_loss=0.01238, audio_tagging_loss=0.00883, over 3051682.62 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:17:02,043 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.45 vs. limit=15.0 2023-11-27 20:17:02,152 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=15.0 2023-11-27 20:17:14,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3219226.6666666665, ans=0.0 2023-11-27 20:17:14,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3219226.6666666665, ans=0.0 2023-11-27 20:17:21,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3219226.6666666665, ans=0.125 2023-11-27 20:17:23,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.03 vs. limit=12.0 2023-11-27 20:17:29,287 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482900 2023-11-27 20:17:33,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3219293.3333333335, ans=0.1 2023-11-27 20:17:35,837 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1950, loss[loss=0.07524, simple_loss=0.09292, pruned_loss=0.01673, audio_tagging_loss=0.01205, over 14289.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08947, pruned_loss=0.01249, audio_tagging_loss=0.00883, over 3054152.58 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:17:42,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3219360.0, ans=0.0 2023-11-27 20:17:57,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3219426.6666666665, ans=0.125 2023-11-27 20:18:15,538 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 8.669e+01 9.288e+01 9.966e+01 1.212e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-27 20:18:22,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3219626.6666666665, ans=0.09899494936611666 2023-11-27 20:18:27,135 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482950 2023-11-27 20:18:33,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3219693.3333333335, ans=0.125 2023-11-27 20:18:34,222 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2000, loss[loss=0.07453, simple_loss=0.1098, pruned_loss=0.01356, audio_tagging_loss=0.006071, over 15425.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08892, pruned_loss=0.01234, audio_tagging_loss=0.008821, over 3047608.27 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:18:36,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3219693.3333333335, ans=0.125 2023-11-27 20:18:41,037 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.65 vs. limit=15.0 2023-11-27 20:18:57,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3219826.6666666665, ans=0.125 2023-11-27 20:18:59,426 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.94 vs. limit=15.0 2023-11-27 20:19:02,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3219826.6666666665, ans=0.2 2023-11-27 20:19:04,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3219826.6666666665, ans=6.0 2023-11-27 20:19:25,584 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483000 2023-11-27 20:19:32,546 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2050, loss[loss=0.07095, simple_loss=0.09713, pruned_loss=0.01483, audio_tagging_loss=0.007559, over 14845.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08988, pruned_loss=0.01265, audio_tagging_loss=0.008788, over 3041684.17 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:19:53,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3220093.3333333335, ans=0.0 2023-11-27 20:20:03,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3220160.0, ans=0.0 2023-11-27 20:20:12,034 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.951e+01 8.893e+01 9.583e+01 1.011e+02 1.256e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 20:20:14,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3220226.6666666665, ans=0.0 2023-11-27 20:20:14,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3220226.6666666665, ans=0.125 2023-11-27 20:20:22,995 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483050 2023-11-27 20:20:26,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3220293.3333333335, ans=0.125 2023-11-27 20:20:29,583 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2100, loss[loss=0.06051, simple_loss=0.08309, pruned_loss=0.00823, audio_tagging_loss=0.01073, over 14594.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08998, pruned_loss=0.01264, audio_tagging_loss=0.008731, over 3040932.38 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:20:51,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3220493.3333333335, ans=0.0 2023-11-27 20:20:52,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3220493.3333333335, ans=0.07 2023-11-27 20:20:58,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3220493.3333333335, ans=0.125 2023-11-27 20:21:14,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3220626.6666666665, ans=0.0 2023-11-27 20:21:19,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3220626.6666666665, ans=0.125 2023-11-27 20:21:20,574 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483100 2023-11-27 20:21:27,429 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2150, loss[loss=0.08077, simple_loss=0.1154, pruned_loss=0.015, audio_tagging_loss=0.00806, over 15793.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09062, pruned_loss=0.01277, audio_tagging_loss=0.008837, over 3041442.76 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:21:30,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3220693.3333333335, ans=0.125 2023-11-27 20:21:33,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3220693.3333333335, ans=0.0 2023-11-27 20:21:54,120 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.26 vs. limit=22.5 2023-11-27 20:22:03,883 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:22:05,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3220893.3333333335, ans=0.0 2023-11-27 20:22:07,611 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.323e+01 8.704e+01 9.254e+01 9.792e+01 1.378e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-27 20:22:17,574 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483150 2023-11-27 20:22:17,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3220960.0, ans=0.1 2023-11-27 20:22:25,333 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2200, loss[loss=0.08283, simple_loss=0.1124, pruned_loss=0.01718, audio_tagging_loss=0.009426, over 15320.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.08984, pruned_loss=0.0126, audio_tagging_loss=0.008839, over 3029841.69 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:22:46,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3221160.0, ans=0.0 2023-11-27 20:22:47,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=3221160.0, ans=0.2 2023-11-27 20:23:06,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3221226.6666666665, ans=0.07 2023-11-27 20:23:12,152 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.22 vs. limit=15.0 2023-11-27 20:23:16,036 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483200 2023-11-27 20:23:16,733 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2023-11-27 20:23:23,025 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2250, loss[loss=0.06624, simple_loss=0.09535, pruned_loss=0.01102, audio_tagging_loss=0.00754, over 14941.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.0906, pruned_loss=0.01265, audio_tagging_loss=0.008754, over 3036516.74 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:23:49,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3221493.3333333335, ans=0.0 2023-11-27 20:23:51,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3221493.3333333335, ans=0.025 2023-11-27 20:24:03,954 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.981e+01 8.930e+01 9.422e+01 1.015e+02 1.618e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-27 20:24:08,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3221626.6666666665, ans=0.125 2023-11-27 20:24:13,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3221626.6666666665, ans=0.125 2023-11-27 20:24:14,047 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483250 2023-11-27 20:24:17,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3221626.6666666665, ans=0.0 2023-11-27 20:24:21,366 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2300, loss[loss=0.08064, simple_loss=0.1021, pruned_loss=0.01874, audio_tagging_loss=0.01085, over 15567.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09122, pruned_loss=0.01261, audio_tagging_loss=0.008739, over 3039114.93 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:24:55,927 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:24:56,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3221893.3333333335, ans=0.0 2023-11-27 20:25:11,982 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483300 2023-11-27 20:25:14,177 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:25:19,107 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2350, loss[loss=0.0671, simple_loss=0.08113, pruned_loss=0.01566, audio_tagging_loss=0.01087, over 15389.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09196, pruned_loss=0.01262, audio_tagging_loss=0.008771, over 3037486.83 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:25:35,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3222093.3333333335, ans=0.125 2023-11-27 20:25:58,847 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.350e+01 8.811e+01 9.279e+01 1.007e+02 1.436e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-27 20:26:09,507 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483350 2023-11-27 20:26:16,850 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2400, loss[loss=0.08167, simple_loss=0.1155, pruned_loss=0.01915, audio_tagging_loss=0.004746, over 15574.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09192, pruned_loss=0.01257, audio_tagging_loss=0.008761, over 3043361.44 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:26:17,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3222360.0, ans=0.0 2023-11-27 20:26:18,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3222360.0, ans=0.1 2023-11-27 20:26:29,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3222426.6666666665, ans=0.125 2023-11-27 20:26:43,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3222493.3333333335, ans=0.0 2023-11-27 20:26:49,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3222493.3333333335, ans=0.2 2023-11-27 20:26:53,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3222560.0, ans=0.09899494936611666 2023-11-27 20:27:05,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3222626.6666666665, ans=0.0 2023-11-27 20:27:07,844 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483400 2023-11-27 20:27:10,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3222626.6666666665, ans=0.1 2023-11-27 20:27:15,135 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2450, loss[loss=0.04275, simple_loss=0.05404, pruned_loss=0.006547, audio_tagging_loss=0.009186, over 16029.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09122, pruned_loss=0.01263, audio_tagging_loss=0.008836, over 3052767.20 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:27:21,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3222693.3333333335, ans=0.0 2023-11-27 20:27:30,176 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:27:34,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3222760.0, ans=0.0 2023-11-27 20:27:35,264 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:27:47,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3222826.6666666665, ans=0.0 2023-11-27 20:27:56,801 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.146e+01 8.599e+01 9.201e+01 9.948e+01 1.437e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-27 20:28:04,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3222960.0, ans=0.125 2023-11-27 20:28:05,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3222960.0, ans=0.95 2023-11-27 20:28:06,191 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483450 2023-11-27 20:28:12,699 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2500, loss[loss=0.09006, simple_loss=0.1302, pruned_loss=0.01919, audio_tagging_loss=0.005792, over 15248.00 frames. ], tot_loss[loss=0.06754, simple_loss=0.09176, pruned_loss=0.01281, audio_tagging_loss=0.008853, over 3051477.82 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:28:13,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3223026.6666666665, ans=0.0 2023-11-27 20:28:58,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3223293.3333333335, ans=0.125 2023-11-27 20:29:04,290 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483500 2023-11-27 20:29:04,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3223293.3333333335, ans=0.125 2023-11-27 20:29:10,720 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2550, loss[loss=0.0619, simple_loss=0.07767, pruned_loss=0.01423, audio_tagging_loss=0.008839, over 14871.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09081, pruned_loss=0.01259, audio_tagging_loss=0.008827, over 3047171.56 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:29:32,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3223426.6666666665, ans=0.1 2023-11-27 20:29:40,090 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.42 vs. limit=15.0 2023-11-27 20:29:46,253 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-27 20:29:50,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3223560.0, ans=0.125 2023-11-27 20:29:51,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3223560.0, ans=0.025 2023-11-27 20:29:52,375 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 8.657e+01 9.326e+01 1.025e+02 1.204e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 20:29:58,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3223626.6666666665, ans=0.1 2023-11-27 20:29:59,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3223626.6666666665, ans=0.1 2023-11-27 20:30:01,951 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483550 2023-11-27 20:30:08,859 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2600, loss[loss=0.06096, simple_loss=0.07427, pruned_loss=0.01342, audio_tagging_loss=0.01041, over 14928.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.0897, pruned_loss=0.01242, audio_tagging_loss=0.008781, over 3050049.82 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:30:20,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3223760.0, ans=0.0 2023-11-27 20:30:22,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3223760.0, ans=0.125 2023-11-27 20:30:51,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3223893.3333333335, ans=0.125 2023-11-27 20:30:59,564 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483600 2023-11-27 20:31:01,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3223960.0, ans=0.2 2023-11-27 20:31:06,368 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2650, loss[loss=0.08228, simple_loss=0.1193, pruned_loss=0.0158, audio_tagging_loss=0.006834, over 15218.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09003, pruned_loss=0.01253, audio_tagging_loss=0.008665, over 3049454.01 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:31:37,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3224160.0, ans=0.0 2023-11-27 20:31:48,425 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.355e+01 8.676e+01 9.510e+01 9.992e+01 1.225e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 20:31:50,235 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.57 vs. limit=10.0 2023-11-27 20:31:57,873 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483650 2023-11-27 20:32:00,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3224293.3333333335, ans=0.0 2023-11-27 20:32:04,335 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2700, loss[loss=0.06311, simple_loss=0.08582, pruned_loss=0.01248, audio_tagging_loss=0.007723, over 15165.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08984, pruned_loss=0.01248, audio_tagging_loss=0.00866, over 3055458.38 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:32:11,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3224360.0, ans=0.125 2023-11-27 20:32:11,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3224360.0, ans=0.2 2023-11-27 20:32:30,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3224493.3333333335, ans=0.0 2023-11-27 20:32:52,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3224626.6666666665, ans=0.125 2023-11-27 20:32:55,168 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483700 2023-11-27 20:33:02,273 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2750, loss[loss=0.06822, simple_loss=0.09509, pruned_loss=0.01155, audio_tagging_loss=0.00913, over 16426.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09021, pruned_loss=0.01258, audio_tagging_loss=0.008692, over 3058823.57 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:33:43,554 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.983e+01 8.556e+01 9.189e+01 9.890e+01 1.172e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-27 20:33:53,802 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:33:53,839 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483750 2023-11-27 20:34:00,309 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2800, loss[loss=0.05328, simple_loss=0.07492, pruned_loss=0.006387, audio_tagging_loss=0.009428, over 14800.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08988, pruned_loss=0.01232, audio_tagging_loss=0.008621, over 3053543.83 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:34:17,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.89 vs. limit=10.0 2023-11-27 20:34:18,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3225093.3333333335, ans=0.0 2023-11-27 20:34:19,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3225093.3333333335, ans=10.0 2023-11-27 20:34:23,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3225160.0, ans=0.2 2023-11-27 20:34:25,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3225160.0, ans=0.125 2023-11-27 20:34:36,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3225226.6666666665, ans=0.1 2023-11-27 20:34:39,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3225226.6666666665, ans=0.125 2023-11-27 20:34:42,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3225226.6666666665, ans=0.125 2023-11-27 20:34:51,466 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483800 2023-11-27 20:34:58,498 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2850, loss[loss=0.06876, simple_loss=0.08683, pruned_loss=0.01591, audio_tagging_loss=0.009431, over 15100.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08875, pruned_loss=0.01216, audio_tagging_loss=0.008579, over 3040988.67 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:35:19,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3225426.6666666665, ans=0.125 2023-11-27 20:35:21,188 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=22.5 2023-11-27 20:35:38,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3225560.0, ans=0.0 2023-11-27 20:35:41,037 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.142e+01 8.834e+01 9.311e+01 1.027e+02 1.174e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-27 20:35:48,749 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483850 2023-11-27 20:35:53,653 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.19 vs. limit=15.0 2023-11-27 20:35:55,323 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2900, loss[loss=0.06386, simple_loss=0.07864, pruned_loss=0.01377, audio_tagging_loss=0.01077, over 16696.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08981, pruned_loss=0.01234, audio_tagging_loss=0.008556, over 3047281.37 frames. ], batch size: 64, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:36:13,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.84 vs. limit=22.5 2023-11-27 20:36:46,560 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483900 2023-11-27 20:36:53,782 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2950, loss[loss=0.05558, simple_loss=0.07465, pruned_loss=0.01015, audio_tagging_loss=0.008109, over 15643.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09016, pruned_loss=0.01251, audio_tagging_loss=0.008597, over 3048095.56 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:37:00,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3226026.6666666665, ans=0.0 2023-11-27 20:37:07,088 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.52 vs. limit=22.5 2023-11-27 20:37:10,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3226093.3333333335, ans=0.125 2023-11-27 20:37:15,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3226160.0, ans=0.0 2023-11-27 20:37:17,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3226160.0, ans=0.2 2023-11-27 20:37:23,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3226160.0, ans=0.125 2023-11-27 20:37:24,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3226160.0, ans=0.125 2023-11-27 20:37:36,762 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.737e+01 8.664e+01 9.410e+01 9.930e+01 1.488e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 20:37:43,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3226293.3333333335, ans=0.0 2023-11-27 20:37:44,515 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483950 2023-11-27 20:37:51,790 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3000, loss[loss=0.04179, simple_loss=0.04891, pruned_loss=0.006306, audio_tagging_loss=0.01103, over 14803.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.0896, pruned_loss=0.01242, audio_tagging_loss=0.00874, over 3044183.47 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:37:51,792 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 20:38:26,088 INFO [train_asr.py:1267] (0/4) Epoch 41, validation: loss=0.0572, simple_loss=0.05061, pruned_loss=0.005192, audio_tagging_loss=0.0267, over 4681554.00 frames. 2023-11-27 20:38:26,089 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 20:38:45,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3226426.6666666665, ans=0.125 2023-11-27 20:38:59,149 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.71 vs. limit=15.0 2023-11-27 20:39:17,168 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484000 2023-11-27 20:39:18,506 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-484000.pt 2023-11-27 20:39:21,333 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:39:26,490 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3050, loss[loss=0.08936, simple_loss=0.1228, pruned_loss=0.02149, audio_tagging_loss=0.006496, over 15513.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08987, pruned_loss=0.0125, audio_tagging_loss=0.00876, over 3044538.42 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:39:31,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3226693.3333333335, ans=0.125 2023-11-27 20:39:34,349 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.97 vs. limit=22.5 2023-11-27 20:39:47,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3226760.0, ans=0.1 2023-11-27 20:39:48,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=22.5 2023-11-27 20:40:01,253 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:40:09,336 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.867e+01 9.400e+01 1.012e+02 1.240e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-27 20:40:09,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3226893.3333333335, ans=0.2 2023-11-27 20:40:12,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3226960.0, ans=0.1 2023-11-27 20:40:17,847 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484050 2023-11-27 20:40:24,356 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3100, loss[loss=0.06654, simple_loss=0.09545, pruned_loss=0.01175, audio_tagging_loss=0.007067, over 15873.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09081, pruned_loss=0.01272, audio_tagging_loss=0.008696, over 3047921.68 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:40:26,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3227026.6666666665, ans=0.125 2023-11-27 20:40:31,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3227026.6666666665, ans=0.2 2023-11-27 20:40:31,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3227026.6666666665, ans=0.125 2023-11-27 20:40:41,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3227093.3333333335, ans=0.125 2023-11-27 20:40:50,772 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.85 vs. limit=6.0 2023-11-27 20:41:03,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3227226.6666666665, ans=10.0 2023-11-27 20:41:07,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3227226.6666666665, ans=0.2 2023-11-27 20:41:14,729 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484100 2023-11-27 20:41:17,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=3227293.3333333335, ans=0.02 2023-11-27 20:41:21,310 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3150, loss[loss=0.06747, simple_loss=0.09605, pruned_loss=0.01098, audio_tagging_loss=0.008468, over 15815.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09107, pruned_loss=0.01273, audio_tagging_loss=0.008794, over 3049655.05 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:41:51,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3227493.3333333335, ans=0.0 2023-11-27 20:41:53,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3227493.3333333335, ans=0.0 2023-11-27 20:41:57,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3227560.0, ans=0.0 2023-11-27 20:42:04,121 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 8.840e+01 9.395e+01 9.954e+01 1.405e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 20:42:06,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3227626.6666666665, ans=0.1 2023-11-27 20:42:12,799 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484150 2023-11-27 20:42:16,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3227626.6666666665, ans=0.125 2023-11-27 20:42:19,254 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3200, loss[loss=0.07424, simple_loss=0.1115, pruned_loss=0.01064, audio_tagging_loss=0.00784, over 16093.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09052, pruned_loss=0.01241, audio_tagging_loss=0.008918, over 3056864.27 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:42:22,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3227693.3333333335, ans=0.0 2023-11-27 20:42:49,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3227826.6666666665, ans=0.125 2023-11-27 20:43:02,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3227893.3333333335, ans=0.125 2023-11-27 20:43:10,589 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484200 2023-11-27 20:43:18,071 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3250, loss[loss=0.06188, simple_loss=0.07693, pruned_loss=0.008984, audio_tagging_loss=0.01443, over 15419.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09051, pruned_loss=0.01236, audio_tagging_loss=0.009089, over 3053330.66 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:43:27,648 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=22.5 2023-11-27 20:43:36,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3228093.3333333335, ans=0.0 2023-11-27 20:43:44,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3228160.0, ans=0.1 2023-11-27 20:43:57,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3228226.6666666665, ans=0.1 2023-11-27 20:44:00,691 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 8.543e+01 9.307e+01 1.025e+02 1.528e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 20:44:00,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3228226.6666666665, ans=0.0 2023-11-27 20:44:04,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3228293.3333333335, ans=0.125 2023-11-27 20:44:04,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3228293.3333333335, ans=10.0 2023-11-27 20:44:08,482 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484250 2023-11-27 20:44:14,947 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3300, loss[loss=0.05957, simple_loss=0.08168, pruned_loss=0.008908, audio_tagging_loss=0.009823, over 15512.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09024, pruned_loss=0.01237, audio_tagging_loss=0.009088, over 3051050.73 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:44:23,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3228360.0, ans=0.125 2023-11-27 20:44:41,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3228493.3333333335, ans=0.125 2023-11-27 20:44:42,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3228493.3333333335, ans=0.2 2023-11-27 20:44:48,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3228493.3333333335, ans=0.125 2023-11-27 20:45:06,286 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484300 2023-11-27 20:45:08,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3228626.6666666665, ans=0.125 2023-11-27 20:45:12,789 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3350, loss[loss=0.06087, simple_loss=0.07768, pruned_loss=0.01104, audio_tagging_loss=0.011, over 15539.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09131, pruned_loss=0.01267, audio_tagging_loss=0.00892, over 3057398.15 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:45:38,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3228826.6666666665, ans=0.125 2023-11-27 20:45:44,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3228826.6666666665, ans=0.0 2023-11-27 20:45:55,623 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 8.657e+01 9.246e+01 1.011e+02 1.317e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 20:45:55,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3228893.3333333335, ans=0.125 2023-11-27 20:45:55,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3228893.3333333335, ans=0.125 2023-11-27 20:45:58,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3228960.0, ans=0.125 2023-11-27 20:46:02,358 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484350 2023-11-27 20:46:10,128 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3400, loss[loss=0.06726, simple_loss=0.09071, pruned_loss=0.01229, audio_tagging_loss=0.009611, over 14785.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09133, pruned_loss=0.01265, audio_tagging_loss=0.008808, over 3055648.77 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:46:20,457 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=12.0 2023-11-27 20:46:29,493 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.73 vs. limit=22.5 2023-11-27 20:46:31,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3229160.0, ans=0.0 2023-11-27 20:47:00,256 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484400 2023-11-27 20:47:07,150 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3450, loss[loss=0.07951, simple_loss=0.1162, pruned_loss=0.0143, audio_tagging_loss=0.007127, over 14991.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.091, pruned_loss=0.0125, audio_tagging_loss=0.008692, over 3044913.55 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:47:11,837 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:47:34,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3229493.3333333335, ans=0.0 2023-11-27 20:47:42,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3229560.0, ans=0.1 2023-11-27 20:47:50,694 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 8.393e+01 9.066e+01 9.893e+01 1.377e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-27 20:47:50,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3229560.0, ans=0.0 2023-11-27 20:47:57,390 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484450 2023-11-27 20:48:04,375 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3500, loss[loss=0.07313, simple_loss=0.09953, pruned_loss=0.0135, audio_tagging_loss=0.009865, over 15012.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09071, pruned_loss=0.01252, audio_tagging_loss=0.008687, over 3047001.73 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:48:16,015 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2023-11-27 20:48:30,339 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.61 vs. limit=15.0 2023-11-27 20:48:35,208 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:48:45,621 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.12 vs. limit=15.0 2023-11-27 20:48:48,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3229893.3333333335, ans=0.1 2023-11-27 20:48:50,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3229960.0, ans=0.1 2023-11-27 20:48:54,655 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484500 2023-11-27 20:48:58,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3229960.0, ans=0.125 2023-11-27 20:48:58,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2023-11-27 20:48:59,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3229960.0, ans=0.125 2023-11-27 20:49:01,727 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3550, loss[loss=0.05056, simple_loss=0.06657, pruned_loss=0.006634, audio_tagging_loss=0.01064, over 14966.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09042, pruned_loss=0.01254, audio_tagging_loss=0.008638, over 3046667.60 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:49:01,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3230026.6666666665, ans=0.1 2023-11-27 20:49:10,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3230026.6666666665, ans=0.07 2023-11-27 20:49:14,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3230093.3333333335, ans=0.125 2023-11-27 20:49:20,586 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.47 vs. limit=10.0 2023-11-27 20:49:28,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3230160.0, ans=0.125 2023-11-27 20:49:34,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.47 vs. limit=10.0 2023-11-27 20:49:45,464 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.934e+01 8.597e+01 9.146e+01 1.002e+02 1.167e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-27 20:49:46,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3230293.3333333335, ans=0.0 2023-11-27 20:49:52,236 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-11-27 20:49:52,877 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484550 2023-11-27 20:49:59,415 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3600, loss[loss=0.06405, simple_loss=0.09008, pruned_loss=0.01271, audio_tagging_loss=0.006307, over 13707.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09049, pruned_loss=0.0125, audio_tagging_loss=0.008667, over 3050145.68 frames. ], batch size: 52, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:50:01,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3230360.0, ans=0.125 2023-11-27 20:50:31,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3230493.3333333335, ans=0.0 2023-11-27 20:50:42,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3230560.0, ans=0.125 2023-11-27 20:50:42,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3230560.0, ans=15.0 2023-11-27 20:50:49,724 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484600 2023-11-27 20:50:57,230 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3650, loss[loss=0.0614, simple_loss=0.08252, pruned_loss=0.01185, audio_tagging_loss=0.00829, over 15083.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09065, pruned_loss=0.01259, audio_tagging_loss=0.008609, over 3052496.41 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:51:01,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3230693.3333333335, ans=0.07 2023-11-27 20:51:02,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3230693.3333333335, ans=0.0 2023-11-27 20:51:20,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3230826.6666666665, ans=0.125 2023-11-27 20:51:29,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3230826.6666666665, ans=0.2 2023-11-27 20:51:32,504 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:51:41,950 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 8.915e+01 9.732e+01 1.035e+02 1.318e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-27 20:51:47,436 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484650 2023-11-27 20:51:53,893 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3700, loss[loss=0.06714, simple_loss=0.09095, pruned_loss=0.01491, audio_tagging_loss=0.006754, over 16548.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09148, pruned_loss=0.01267, audio_tagging_loss=0.008605, over 3053219.74 frames. ], batch size: 66, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:52:11,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3231093.3333333335, ans=0.0 2023-11-27 20:52:28,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3231226.6666666665, ans=0.125 2023-11-27 20:52:41,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3231293.3333333335, ans=0.125 2023-11-27 20:52:45,156 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484700 2023-11-27 20:52:47,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3231293.3333333335, ans=0.125 2023-11-27 20:52:51,692 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3750, loss[loss=0.06096, simple_loss=0.0864, pruned_loss=0.01001, audio_tagging_loss=0.007747, over 14479.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09156, pruned_loss=0.01256, audio_tagging_loss=0.008594, over 3054583.32 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:53:31,762 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.35 vs. limit=15.0 2023-11-27 20:53:33,166 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:53:36,405 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.100e+01 8.777e+01 9.406e+01 1.027e+02 1.290e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-27 20:53:38,512 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:53:39,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3231626.6666666665, ans=0.95 2023-11-27 20:53:42,616 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484750 2023-11-27 20:53:42,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3231626.6666666665, ans=0.2 2023-11-27 20:53:46,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3231626.6666666665, ans=0.125 2023-11-27 20:53:49,563 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3800, loss[loss=0.07457, simple_loss=0.09226, pruned_loss=0.01971, audio_tagging_loss=0.008733, over 14429.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09199, pruned_loss=0.01269, audio_tagging_loss=0.008659, over 3057581.83 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:53:49,830 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:54:08,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3231760.0, ans=0.125 2023-11-27 20:54:09,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3231760.0, ans=0.1 2023-11-27 20:54:39,468 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484800 2023-11-27 20:54:46,257 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3850, loss[loss=0.05508, simple_loss=0.06667, pruned_loss=0.008708, audio_tagging_loss=0.01304, over 15926.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.091, pruned_loss=0.01256, audio_tagging_loss=0.008711, over 3056287.07 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:54:53,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3232026.6666666665, ans=0.125 2023-11-27 20:54:54,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3232026.6666666665, ans=0.125 2023-11-27 20:55:14,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3232160.0, ans=0.0 2023-11-27 20:55:22,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3232226.6666666665, ans=0.1 2023-11-27 20:55:29,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3232226.6666666665, ans=0.125 2023-11-27 20:55:29,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3232226.6666666665, ans=0.125 2023-11-27 20:55:31,025 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.176e+01 8.660e+01 9.334e+01 1.001e+02 1.347e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 20:55:37,218 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484850 2023-11-27 20:55:37,351 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:55:41,265 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.02 vs. limit=15.0 2023-11-27 20:55:43,748 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3900, loss[loss=0.07008, simple_loss=0.09208, pruned_loss=0.01328, audio_tagging_loss=0.01076, over 14878.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09019, pruned_loss=0.01258, audio_tagging_loss=0.008791, over 3046628.46 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:55:59,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3232426.6666666665, ans=0.125 2023-11-27 20:56:09,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3232493.3333333335, ans=0.0 2023-11-27 20:56:10,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3232493.3333333335, ans=0.125 2023-11-27 20:56:13,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3232493.3333333335, ans=0.125 2023-11-27 20:56:15,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3232493.3333333335, ans=0.125 2023-11-27 20:56:20,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3232560.0, ans=0.125 2023-11-27 20:56:21,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3232560.0, ans=0.0 2023-11-27 20:56:22,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.29 vs. limit=10.0 2023-11-27 20:56:24,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3232560.0, ans=0.0 2023-11-27 20:56:25,107 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.51 vs. limit=22.5 2023-11-27 20:56:31,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3232626.6666666665, ans=0.125 2023-11-27 20:56:34,465 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484900 2023-11-27 20:56:42,120 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3950, loss[loss=0.05992, simple_loss=0.08159, pruned_loss=0.009756, audio_tagging_loss=0.009373, over 14200.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09011, pruned_loss=0.01251, audio_tagging_loss=0.008885, over 3039360.70 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:56:43,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3232693.3333333335, ans=0.125 2023-11-27 20:57:05,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3232826.6666666665, ans=0.1 2023-11-27 20:57:06,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3232826.6666666665, ans=0.1 2023-11-27 20:57:11,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.11 vs. limit=15.0 2023-11-27 20:57:12,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3232826.6666666665, ans=0.125 2023-11-27 20:57:18,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3232893.3333333335, ans=0.2 2023-11-27 20:57:26,949 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.571e+01 9.462e+01 1.016e+02 1.341e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-27 20:57:30,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3232960.0, ans=15.0 2023-11-27 20:57:32,679 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484950 2023-11-27 20:57:39,308 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4000, loss[loss=0.06311, simple_loss=0.09399, pruned_loss=0.006531, audio_tagging_loss=0.009589, over 15900.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09028, pruned_loss=0.01263, audio_tagging_loss=0.009026, over 3037456.87 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:57:41,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3233026.6666666665, ans=0.1 2023-11-27 20:58:28,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3233293.3333333335, ans=0.125 2023-11-27 20:58:29,473 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485000 2023-11-27 20:58:30,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3233293.3333333335, ans=0.1 2023-11-27 20:58:36,272 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4050, loss[loss=0.066, simple_loss=0.09454, pruned_loss=0.01034, audio_tagging_loss=0.008394, over 15402.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09052, pruned_loss=0.01272, audio_tagging_loss=0.009113, over 3036924.71 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:58:40,691 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:58:44,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3233360.0, ans=0.0 2023-11-27 20:58:45,749 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.88 vs. limit=15.0 2023-11-27 20:58:50,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3233426.6666666665, ans=0.2 2023-11-27 20:59:08,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3233493.3333333335, ans=0.125 2023-11-27 20:59:13,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3233560.0, ans=0.125 2023-11-27 20:59:20,314 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.753e+01 8.859e+01 9.526e+01 1.036e+02 1.251e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-27 20:59:25,744 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485050 2023-11-27 20:59:31,827 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=15.0 2023-11-27 20:59:32,421 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4100, loss[loss=0.0614, simple_loss=0.08632, pruned_loss=0.009645, audio_tagging_loss=0.008594, over 16278.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09005, pruned_loss=0.01262, audio_tagging_loss=0.009092, over 3041599.51 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:59:50,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3233760.0, ans=0.1 2023-11-27 21:00:06,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.88 vs. limit=12.0 2023-11-27 21:00:17,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3233960.0, ans=0.0 2023-11-27 21:00:20,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3233960.0, ans=0.1 2023-11-27 21:00:23,224 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485100 2023-11-27 21:00:30,323 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4150, loss[loss=0.07764, simple_loss=0.1094, pruned_loss=0.01643, audio_tagging_loss=0.006523, over 15174.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09041, pruned_loss=0.01252, audio_tagging_loss=0.008966, over 3038943.50 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 21:00:33,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3234026.6666666665, ans=0.125 2023-11-27 21:00:36,160 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.33 vs. limit=22.5 2023-11-27 21:01:00,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3234160.0, ans=0.125 2023-11-27 21:01:00,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3234160.0, ans=0.2 2023-11-27 21:01:13,451 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 21:01:15,583 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.836e+01 8.501e+01 9.252e+01 1.004e+02 1.216e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-27 21:01:20,622 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485150 2023-11-27 21:01:21,120 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.03 vs. limit=6.0 2023-11-27 21:01:27,054 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4200, loss[loss=0.07214, simple_loss=0.1072, pruned_loss=0.01249, audio_tagging_loss=0.006049, over 14927.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08999, pruned_loss=0.01241, audio_tagging_loss=0.008782, over 3032852.91 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:01:33,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3234360.0, ans=0.125 2023-11-27 21:01:35,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3234360.0, ans=0.2 2023-11-27 21:01:38,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=12.0 2023-11-27 21:01:57,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3234493.3333333335, ans=0.0 2023-11-27 21:02:01,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3234560.0, ans=0.1 2023-11-27 21:02:14,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3234626.6666666665, ans=0.125 2023-11-27 21:02:17,471 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485200 2023-11-27 21:02:24,409 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4250, loss[loss=0.08165, simple_loss=0.1126, pruned_loss=0.01914, audio_tagging_loss=0.006208, over 15375.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.08988, pruned_loss=0.0125, audio_tagging_loss=0.0088, over 3041137.63 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:02:28,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3234693.3333333335, ans=0.1 2023-11-27 21:02:30,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3234693.3333333335, ans=0.1 2023-11-27 21:02:36,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3234760.0, ans=0.1 2023-11-27 21:02:37,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=3234760.0, ans=15.0 2023-11-27 21:03:10,375 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 9.065e+01 9.544e+01 1.011e+02 1.214e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-27 21:03:10,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3234960.0, ans=0.0 2023-11-27 21:03:15,344 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485250 2023-11-27 21:03:15,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3234960.0, ans=0.05 2023-11-27 21:03:21,924 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4300, loss[loss=0.06402, simple_loss=0.09695, pruned_loss=0.008117, audio_tagging_loss=0.007426, over 15886.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09, pruned_loss=0.01249, audio_tagging_loss=0.008668, over 3039895.05 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:03:23,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3235026.6666666665, ans=10.0 2023-11-27 21:03:30,726 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.17 vs. limit=15.0 2023-11-27 21:03:35,737 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.30 vs. limit=15.0 2023-11-27 21:03:58,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3235226.6666666665, ans=0.125 2023-11-27 21:04:12,596 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485300 2023-11-27 21:04:15,822 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.60 vs. limit=15.0 2023-11-27 21:04:18,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.43 vs. limit=12.0 2023-11-27 21:04:19,752 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4350, loss[loss=0.06429, simple_loss=0.08506, pruned_loss=0.01306, audio_tagging_loss=0.008703, over 15289.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09064, pruned_loss=0.0127, audio_tagging_loss=0.008591, over 3037728.97 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 8.0 2023-11-27 21:04:24,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3235360.0, ans=0.0 2023-11-27 21:04:31,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=12.0 2023-11-27 21:05:06,665 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 9.007e+01 9.649e+01 1.037e+02 1.293e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-27 21:05:10,124 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485350 2023-11-27 21:05:16,690 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4400, loss[loss=0.06179, simple_loss=0.0748, pruned_loss=0.01258, audio_tagging_loss=0.01182, over 14286.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09081, pruned_loss=0.01293, audio_tagging_loss=0.008546, over 3043032.84 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:05:34,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3235760.0, ans=0.0 2023-11-27 21:05:35,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3235760.0, ans=0.1 2023-11-27 21:05:47,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3235826.6666666665, ans=0.1 2023-11-27 21:05:47,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3235826.6666666665, ans=0.0 2023-11-27 21:06:03,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3235960.0, ans=0.1 2023-11-27 21:06:05,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.48 vs. limit=15.0 2023-11-27 21:06:05,969 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485400 2023-11-27 21:06:13,258 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4450, loss[loss=0.06122, simple_loss=0.08077, pruned_loss=0.01143, audio_tagging_loss=0.009404, over 15050.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09092, pruned_loss=0.0129, audio_tagging_loss=0.008561, over 3050657.22 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:06:23,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3236093.3333333335, ans=0.0 2023-11-27 21:07:00,276 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.283e+01 8.705e+01 9.463e+01 1.011e+02 1.177e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-27 21:07:00,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3236293.3333333335, ans=0.0 2023-11-27 21:07:03,711 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485450 2023-11-27 21:07:07,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3236293.3333333335, ans=0.125 2023-11-27 21:07:10,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3236360.0, ans=0.0 2023-11-27 21:07:11,588 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4500, loss[loss=0.05538, simple_loss=0.06991, pruned_loss=0.008183, audio_tagging_loss=0.01224, over 15131.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09146, pruned_loss=0.01285, audio_tagging_loss=0.008442, over 3055959.48 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:07:12,050 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2023-11-27 21:07:39,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3236493.3333333335, ans=0.125 2023-11-27 21:07:47,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3236560.0, ans=0.0 2023-11-27 21:07:49,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3236560.0, ans=0.125 2023-11-27 21:07:56,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3236626.6666666665, ans=0.0 2023-11-27 21:07:59,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3236626.6666666665, ans=0.125 2023-11-27 21:08:01,670 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485500 2023-11-27 21:08:05,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3236626.6666666665, ans=0.1 2023-11-27 21:08:08,299 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4550, loss[loss=0.05396, simple_loss=0.07006, pruned_loss=0.008621, audio_tagging_loss=0.01032, over 16546.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09007, pruned_loss=0.01274, audio_tagging_loss=0.008574, over 3057231.72 frames. ], batch size: 64, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:08:10,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3236693.3333333335, ans=0.125 2023-11-27 21:08:14,499 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=22.5 2023-11-27 21:08:16,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3236693.3333333335, ans=15.0 2023-11-27 21:08:18,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3236760.0, ans=0.0 2023-11-27 21:08:18,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3236760.0, ans=0.1 2023-11-27 21:08:20,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3236760.0, ans=0.1 2023-11-27 21:08:27,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3236760.0, ans=0.125 2023-11-27 21:08:32,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3236826.6666666665, ans=0.1 2023-11-27 21:08:38,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3236826.6666666665, ans=0.1 2023-11-27 21:08:39,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3236826.6666666665, ans=0.0 2023-11-27 21:08:41,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3236893.3333333335, ans=0.125 2023-11-27 21:08:52,680 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 21:08:54,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3236960.0, ans=0.125 2023-11-27 21:08:54,915 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.586e+01 8.582e+01 9.237e+01 9.730e+01 1.372e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-27 21:08:58,352 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485550 2023-11-27 21:09:02,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3236960.0, ans=0.125 2023-11-27 21:09:05,343 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4600, loss[loss=0.07332, simple_loss=0.09952, pruned_loss=0.0149, audio_tagging_loss=0.008661, over 16198.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09041, pruned_loss=0.01287, audio_tagging_loss=0.008593, over 3051300.48 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:09:27,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3237160.0, ans=0.125 2023-11-27 21:09:52,281 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=22.5 2023-11-27 21:09:55,274 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485600 2023-11-27 21:10:02,139 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4650, loss[loss=0.05079, simple_loss=0.05951, pruned_loss=0.01159, audio_tagging_loss=0.009441, over 15388.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08973, pruned_loss=0.01258, audio_tagging_loss=0.008775, over 3050177.14 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:10:07,704 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.30 vs. limit=15.0 2023-11-27 21:10:26,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3237493.3333333335, ans=0.0 2023-11-27 21:10:41,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3237560.0, ans=0.125 2023-11-27 21:10:44,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3237560.0, ans=0.125 2023-11-27 21:10:49,298 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.300e+01 8.818e+01 9.409e+01 9.994e+01 1.817e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 21:10:52,672 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485650 2023-11-27 21:10:52,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3237626.6666666665, ans=0.2 2023-11-27 21:10:59,623 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4700, loss[loss=0.05277, simple_loss=0.07338, pruned_loss=0.005834, audio_tagging_loss=0.01025, over 14085.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08961, pruned_loss=0.01256, audio_tagging_loss=0.008903, over 3040347.15 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:11:02,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3237693.3333333335, ans=0.125 2023-11-27 21:11:10,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3237760.0, ans=0.125 2023-11-27 21:11:12,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3237760.0, ans=0.0 2023-11-27 21:11:14,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=3237760.0, ans=0.2 2023-11-27 21:11:14,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.16 vs. limit=15.0 2023-11-27 21:11:26,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3237826.6666666665, ans=0.125 2023-11-27 21:11:40,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.74 vs. limit=22.5 2023-11-27 21:11:49,949 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485700 2023-11-27 21:11:56,976 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4750, loss[loss=0.05573, simple_loss=0.07899, pruned_loss=0.005691, audio_tagging_loss=0.01054, over 15181.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08958, pruned_loss=0.01235, audio_tagging_loss=0.008919, over 3044830.75 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:12:07,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3238093.3333333335, ans=0.125 2023-11-27 21:12:23,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3238160.0, ans=0.95 2023-11-27 21:12:43,625 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.636e+01 8.911e+01 9.575e+01 1.033e+02 1.448e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 21:12:46,975 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485750 2023-11-27 21:12:53,378 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4800, loss[loss=0.07329, simple_loss=0.09888, pruned_loss=0.01488, audio_tagging_loss=0.008969, over 14907.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08882, pruned_loss=0.01225, audio_tagging_loss=0.009068, over 3049454.47 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 21:13:44,075 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485800 2023-11-27 21:13:44,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3238626.6666666665, ans=0.0 2023-11-27 21:13:45,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3238626.6666666665, ans=0.2 2023-11-27 21:13:50,918 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4850, loss[loss=0.07403, simple_loss=0.1022, pruned_loss=0.01236, audio_tagging_loss=0.01057, over 14510.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.08963, pruned_loss=0.01249, audio_tagging_loss=0.00918, over 3054954.74 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:13:51,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3238693.3333333335, ans=0.125 2023-11-27 21:14:04,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3238760.0, ans=0.0 2023-11-27 21:14:24,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3238826.6666666665, ans=0.125 2023-11-27 21:14:33,588 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=15.0 2023-11-27 21:14:39,773 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 8.891e+01 9.390e+01 1.010e+02 1.385e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-27 21:14:43,029 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485850 2023-11-27 21:14:49,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3239026.6666666665, ans=0.07 2023-11-27 21:14:49,871 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4900, loss[loss=0.0531, simple_loss=0.07256, pruned_loss=0.007601, audio_tagging_loss=0.009219, over 16517.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09087, pruned_loss=0.01262, audio_tagging_loss=0.009086, over 3057920.58 frames. ], batch size: 64, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:15:08,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3239093.3333333335, ans=0.95 2023-11-27 21:15:11,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3239093.3333333335, ans=0.0 2023-11-27 21:15:17,594 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:15:23,610 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.01 vs. limit=15.0 2023-11-27 21:15:26,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3239226.6666666665, ans=0.0 2023-11-27 21:15:47,520 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485900 2023-11-27 21:15:52,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3239293.3333333335, ans=0.125 2023-11-27 21:15:57,776 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4950, loss[loss=0.06381, simple_loss=0.0883, pruned_loss=0.01127, audio_tagging_loss=0.008387, over 15111.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09037, pruned_loss=0.01243, audio_tagging_loss=0.008923, over 3056272.53 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:16:01,113 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.84 vs. limit=22.5 2023-11-27 21:16:59,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3239560.0, ans=0.125 2023-11-27 21:17:02,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3239560.0, ans=0.2 2023-11-27 21:17:02,944 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.72 vs. limit=15.0 2023-11-27 21:17:30,942 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.683e+01 9.334e+01 9.956e+01 1.191e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 21:17:36,607 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485950 2023-11-27 21:17:48,937 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5000, loss[loss=0.04776, simple_loss=0.06755, pruned_loss=0.007132, audio_tagging_loss=0.006852, over 15029.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09038, pruned_loss=0.01245, audio_tagging_loss=0.008848, over 3056298.68 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:18:10,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.20 vs. limit=5.0 2023-11-27 21:18:40,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.86 vs. limit=22.5 2023-11-27 21:19:11,133 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486000 2023-11-27 21:19:22,899 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5050, loss[loss=0.04261, simple_loss=0.05156, pruned_loss=0.007325, audio_tagging_loss=0.009503, over 14628.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08995, pruned_loss=0.01236, audio_tagging_loss=0.008748, over 3049863.31 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:19:26,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3240026.6666666665, ans=0.125 2023-11-27 21:19:29,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3240026.6666666665, ans=0.0 2023-11-27 21:19:34,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3240026.6666666665, ans=0.125 2023-11-27 21:19:37,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3240026.6666666665, ans=0.0 2023-11-27 21:19:41,331 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2023-11-27 21:19:44,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3240093.3333333335, ans=0.1 2023-11-27 21:19:59,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3240093.3333333335, ans=0.125 2023-11-27 21:21:15,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3240226.6666666665, ans=0.125 2023-11-27 21:21:39,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3240226.6666666665, ans=0.0 2023-11-27 21:22:11,321 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.444e+01 8.669e+01 9.381e+01 9.908e+01 1.305e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-27 21:22:23,718 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486050 2023-11-27 21:22:36,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3240293.3333333335, ans=0.0 2023-11-27 21:22:53,585 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5100, loss[loss=0.05543, simple_loss=0.0638, pruned_loss=0.009227, audio_tagging_loss=0.0143, over 16221.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08945, pruned_loss=0.01237, audio_tagging_loss=0.008764, over 3046968.35 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:23:48,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3240426.6666666665, ans=0.0 2023-11-27 21:25:46,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3240626.6666666665, ans=0.125 2023-11-27 21:26:15,846 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486100 2023-11-27 21:26:47,646 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5150, loss[loss=0.06328, simple_loss=0.08987, pruned_loss=0.01289, audio_tagging_loss=0.005455, over 14480.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09017, pruned_loss=0.01246, audio_tagging_loss=0.008627, over 3047579.85 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:28:48,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3240826.6666666665, ans=0.1 2023-11-27 21:29:41,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3240893.3333333335, ans=0.025 2023-11-27 21:29:59,211 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.929e+01 8.896e+01 9.394e+01 1.012e+02 1.340e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 21:30:05,526 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486150 2023-11-27 21:30:28,860 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5200, loss[loss=0.07099, simple_loss=0.08866, pruned_loss=0.01453, audio_tagging_loss=0.01213, over 14402.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09034, pruned_loss=0.0126, audio_tagging_loss=0.008645, over 3045648.92 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 21:31:10,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3241093.3333333335, ans=0.1 2023-11-27 21:32:09,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3241160.0, ans=0.0 2023-11-27 21:32:20,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3241226.6666666665, ans=0.05 2023-11-27 21:33:02,100 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:33:20,800 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486200 2023-11-27 21:33:45,364 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5250, loss[loss=0.06153, simple_loss=0.08123, pruned_loss=0.01258, audio_tagging_loss=0.008335, over 15646.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09, pruned_loss=0.01268, audio_tagging_loss=0.008669, over 3047929.02 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:34:54,896 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:36:01,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3241626.6666666665, ans=0.125 2023-11-27 21:36:03,908 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=15.0 2023-11-27 21:36:09,365 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.968e+01 8.648e+01 9.300e+01 9.886e+01 1.149e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-27 21:36:11,482 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486250 2023-11-27 21:36:31,475 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5300, loss[loss=0.07811, simple_loss=0.1103, pruned_loss=0.01246, audio_tagging_loss=0.01048, over 14279.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09042, pruned_loss=0.01259, audio_tagging_loss=0.008679, over 3042173.74 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:36:31,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3241693.3333333335, ans=0.125 2023-11-27 21:37:19,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3241760.0, ans=0.2 2023-11-27 21:37:29,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.74 vs. limit=22.5 2023-11-27 21:38:31,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3241960.0, ans=0.125 2023-11-27 21:38:40,851 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486300 2023-11-27 21:38:49,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3241960.0, ans=0.125 2023-11-27 21:38:57,901 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5350, loss[loss=0.05516, simple_loss=0.07818, pruned_loss=0.009032, audio_tagging_loss=0.007042, over 16022.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09121, pruned_loss=0.01292, audio_tagging_loss=0.008652, over 3039174.78 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:39:00,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3242026.6666666665, ans=0.1 2023-11-27 21:39:01,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3242026.6666666665, ans=0.0 2023-11-27 21:39:08,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3242026.6666666665, ans=0.0 2023-11-27 21:39:41,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3242093.3333333335, ans=0.125 2023-11-27 21:40:30,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3242226.6666666665, ans=0.125 2023-11-27 21:40:57,053 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.785e+01 8.875e+01 9.269e+01 1.018e+02 1.292e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-27 21:40:59,750 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486350 2023-11-27 21:41:13,651 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5400, loss[loss=0.06919, simple_loss=0.09379, pruned_loss=0.01331, audio_tagging_loss=0.008981, over 15343.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09116, pruned_loss=0.01286, audio_tagging_loss=0.008763, over 3039978.30 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:41:50,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3242426.6666666665, ans=0.125 2023-11-27 21:41:52,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3242426.6666666665, ans=0.125 2023-11-27 21:42:38,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3242560.0, ans=0.04949747468305833 2023-11-27 21:42:47,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3242560.0, ans=0.125 2023-11-27 21:43:16,420 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486400 2023-11-27 21:43:34,549 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5450, loss[loss=0.06477, simple_loss=0.08638, pruned_loss=0.0116, audio_tagging_loss=0.009974, over 15021.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09184, pruned_loss=0.01284, audio_tagging_loss=0.008665, over 3045534.45 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:44:30,955 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.82 vs. limit=22.5 2023-11-27 21:44:48,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3242826.6666666665, ans=0.09899494936611666 2023-11-27 21:45:00,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3242893.3333333335, ans=0.035 2023-11-27 21:45:25,869 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 8.703e+01 9.302e+01 9.943e+01 1.219e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-27 21:45:27,753 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486450 2023-11-27 21:45:31,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3242960.0, ans=0.125 2023-11-27 21:45:40,839 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5500, loss[loss=0.07323, simple_loss=0.1088, pruned_loss=0.01212, audio_tagging_loss=0.006732, over 16600.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09141, pruned_loss=0.01283, audio_tagging_loss=0.008676, over 3046165.86 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:46:49,967 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2023-11-27 21:46:50,405 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2023-11-27 21:47:03,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3243226.6666666665, ans=0.125 2023-11-27 21:47:23,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3243226.6666666665, ans=0.125 2023-11-27 21:47:32,967 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=22.5 2023-11-27 21:47:36,632 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486500 2023-11-27 21:47:37,459 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.69 vs. limit=22.5 2023-11-27 21:47:53,144 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5550, loss[loss=0.07862, simple_loss=0.1127, pruned_loss=0.01589, audio_tagging_loss=0.006367, over 15090.00 frames. ], tot_loss[loss=0.06779, simple_loss=0.09232, pruned_loss=0.01292, audio_tagging_loss=0.008711, over 3050262.26 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 8.0 2023-11-27 21:48:27,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.54 vs. limit=15.0 2023-11-27 21:48:34,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3243426.6666666665, ans=0.2 2023-11-27 21:48:39,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3243493.3333333335, ans=0.0 2023-11-27 21:48:39,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3243493.3333333335, ans=0.1 2023-11-27 21:49:08,629 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.22 vs. limit=10.0 2023-11-27 21:49:40,809 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.844e+01 9.360e+01 9.886e+01 1.640e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-27 21:49:41,176 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486550 2023-11-27 21:49:53,616 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5600, loss[loss=0.06007, simple_loss=0.09079, pruned_loss=0.006285, audio_tagging_loss=0.00839, over 15372.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09164, pruned_loss=0.01278, audio_tagging_loss=0.008856, over 3047107.07 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:49:57,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3243693.3333333335, ans=10.0 2023-11-27 21:50:09,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3243693.3333333335, ans=0.125 2023-11-27 21:51:04,017 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=22.5 2023-11-27 21:51:16,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3243893.3333333335, ans=0.95 2023-11-27 21:51:25,988 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 21:51:32,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3243960.0, ans=0.0 2023-11-27 21:51:43,163 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486600 2023-11-27 21:51:55,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3243960.0, ans=0.0 2023-11-27 21:51:58,893 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5650, loss[loss=0.04745, simple_loss=0.06267, pruned_loss=0.006974, audio_tagging_loss=0.009138, over 13813.00 frames. ], tot_loss[loss=0.06805, simple_loss=0.0928, pruned_loss=0.01286, audio_tagging_loss=0.008786, over 3051828.62 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:52:29,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3244093.3333333335, ans=0.2 2023-11-27 21:52:52,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3244160.0, ans=0.125 2023-11-27 21:53:32,348 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.720e+01 9.211e+01 9.882e+01 1.405e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-27 21:53:32,635 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486650 2023-11-27 21:53:42,048 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5700, loss[loss=0.06342, simple_loss=0.08629, pruned_loss=0.01196, audio_tagging_loss=0.008313, over 14475.00 frames. ], tot_loss[loss=0.06788, simple_loss=0.09242, pruned_loss=0.01287, audio_tagging_loss=0.0088, over 3052934.66 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:54:05,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3244426.6666666665, ans=0.0 2023-11-27 21:54:27,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3244493.3333333335, ans=0.0 2023-11-27 21:54:28,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3244493.3333333335, ans=0.125 2023-11-27 21:54:45,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3244560.0, ans=0.2 2023-11-27 21:54:54,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3244560.0, ans=0.0 2023-11-27 21:55:11,719 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2023-11-27 21:55:16,946 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486700 2023-11-27 21:55:29,018 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5750, loss[loss=0.07587, simple_loss=0.1041, pruned_loss=0.01738, audio_tagging_loss=0.006443, over 15239.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09128, pruned_loss=0.01271, audio_tagging_loss=0.00877, over 3051001.43 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:55:35,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3244693.3333333335, ans=0.0 2023-11-27 21:55:37,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3244693.3333333335, ans=0.125 2023-11-27 21:56:01,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3244760.0, ans=0.0 2023-11-27 21:56:03,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3244760.0, ans=0.125 2023-11-27 21:56:49,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3244960.0, ans=0.0 2023-11-27 21:56:55,305 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.899e+01 8.667e+01 9.281e+01 1.002e+02 1.374e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-27 21:56:55,476 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486750 2023-11-27 21:57:08,586 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5800, loss[loss=0.06635, simple_loss=0.08644, pruned_loss=0.0153, audio_tagging_loss=0.007834, over 13623.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09202, pruned_loss=0.01275, audio_tagging_loss=0.0086, over 3054420.22 frames. ], batch size: 50, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:57:26,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3245026.6666666665, ans=0.125 2023-11-27 21:57:28,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3245093.3333333335, ans=0.1 2023-11-27 21:58:31,571 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486800 2023-11-27 21:58:38,470 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2023-11-27 21:58:42,543 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5850, loss[loss=0.06648, simple_loss=0.08768, pruned_loss=0.0128, audio_tagging_loss=0.009841, over 14349.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09101, pruned_loss=0.01271, audio_tagging_loss=0.008552, over 3048771.47 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:58:51,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3245360.0, ans=0.0 2023-11-27 21:59:01,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3245426.6666666665, ans=0.2 2023-11-27 21:59:04,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3245426.6666666665, ans=0.125 2023-11-27 21:59:40,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3245560.0, ans=0.125 2023-11-27 22:00:04,024 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.902e+01 9.558e+01 1.050e+02 1.471e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 22:00:04,205 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486850 2023-11-27 22:00:09,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3245626.6666666665, ans=0.2 2023-11-27 22:00:14,024 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5900, loss[loss=0.07382, simple_loss=0.109, pruned_loss=0.0115, audio_tagging_loss=0.007833, over 15563.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09061, pruned_loss=0.01259, audio_tagging_loss=0.00852, over 3045190.27 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:00:22,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3245693.3333333335, ans=0.2 2023-11-27 22:00:25,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3245693.3333333335, ans=0.0 2023-11-27 22:00:27,170 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2023-11-27 22:00:37,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3245760.0, ans=0.125 2023-11-27 22:01:03,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3245893.3333333335, ans=0.2 2023-11-27 22:01:20,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3245960.0, ans=0.125 2023-11-27 22:01:27,287 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486900 2023-11-27 22:01:36,052 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5950, loss[loss=0.07636, simple_loss=0.1068, pruned_loss=0.01604, audio_tagging_loss=0.006914, over 15823.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09083, pruned_loss=0.01278, audio_tagging_loss=0.008594, over 3046527.40 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:01:37,976 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.21 vs. limit=10.0 2023-11-27 22:02:03,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3246093.3333333335, ans=0.2 2023-11-27 22:02:11,670 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.23 vs. limit=22.5 2023-11-27 22:02:22,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3246226.6666666665, ans=0.125 2023-11-27 22:02:43,391 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.049e+01 8.680e+01 9.187e+01 9.808e+01 1.354e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-27 22:02:43,748 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486950 2023-11-27 22:02:45,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3246293.3333333335, ans=0.07 2023-11-27 22:02:53,439 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6000, loss[loss=0.06614, simple_loss=0.08889, pruned_loss=0.01122, audio_tagging_loss=0.01047, over 15746.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09052, pruned_loss=0.01269, audio_tagging_loss=0.008499, over 3046929.71 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 22:02:53,441 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 22:03:24,601 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4839, 3.3779, 3.6014, 3.5786], device='cuda:0') 2023-11-27 22:03:31,646 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3469, 5.0377, 4.6697, 5.1954], device='cuda:0') 2023-11-27 22:03:35,236 INFO [train_asr.py:1267] (0/4) Epoch 41, validation: loss=0.05724, simple_loss=0.05055, pruned_loss=0.005142, audio_tagging_loss=0.02682, over 4681554.00 frames. 2023-11-27 22:03:35,237 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 22:04:09,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3246493.3333333335, ans=0.125 2023-11-27 22:04:10,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3246493.3333333335, ans=0.2 2023-11-27 22:04:13,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3246493.3333333335, ans=0.125 2023-11-27 22:04:17,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3246493.3333333335, ans=0.125 2023-11-27 22:04:18,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3246560.0, ans=0.0 2023-11-27 22:04:30,198 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 22:04:38,911 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487000 2023-11-27 22:04:41,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3246626.6666666665, ans=0.0 2023-11-27 22:04:47,175 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6050, loss[loss=0.05913, simple_loss=0.06692, pruned_loss=0.01192, audio_tagging_loss=0.01375, over 13862.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09042, pruned_loss=0.01264, audio_tagging_loss=0.008608, over 3041828.78 frames. ], batch size: 52, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:04:52,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3246693.3333333335, ans=0.1 2023-11-27 22:04:54,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3246693.3333333335, ans=0.125 2023-11-27 22:05:01,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3246760.0, ans=0.2 2023-11-27 22:05:05,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3246760.0, ans=0.125 2023-11-27 22:05:11,488 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=15.0 2023-11-27 22:05:21,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3246826.6666666665, ans=0.0 2023-11-27 22:05:36,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3246893.3333333335, ans=0.0 2023-11-27 22:05:38,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3246893.3333333335, ans=0.125 2023-11-27 22:05:38,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3246893.3333333335, ans=0.0 2023-11-27 22:05:47,407 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487050 2023-11-27 22:05:47,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3246960.0, ans=0.125 2023-11-27 22:05:48,477 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 8.706e+01 9.274e+01 9.905e+01 1.388e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 22:05:48,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3246960.0, ans=0.125 2023-11-27 22:05:51,380 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:05:56,283 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6100, loss[loss=0.04982, simple_loss=0.06495, pruned_loss=0.007599, audio_tagging_loss=0.009745, over 15396.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09061, pruned_loss=0.01259, audio_tagging_loss=0.008678, over 3042371.63 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:06:12,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3247093.3333333335, ans=0.1 2023-11-27 22:06:39,975 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2023-11-27 22:06:46,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3247226.6666666665, ans=0.0 2023-11-27 22:06:49,095 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:06:50,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3247293.3333333335, ans=0.0 2023-11-27 22:06:56,266 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487100 2023-11-27 22:07:00,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3247293.3333333335, ans=0.0 2023-11-27 22:07:04,141 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6150, loss[loss=0.06225, simple_loss=0.08684, pruned_loss=0.01129, audio_tagging_loss=0.007542, over 15491.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09119, pruned_loss=0.01262, audio_tagging_loss=0.008683, over 3043045.25 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:07:24,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3247426.6666666665, ans=0.125 2023-11-27 22:07:30,158 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=22.5 2023-11-27 22:07:31,457 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:07:32,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3247493.3333333335, ans=0.125 2023-11-27 22:07:49,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3247560.0, ans=0.125 2023-11-27 22:07:51,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3247560.0, ans=0.2 2023-11-27 22:08:04,397 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487150 2023-11-27 22:08:05,503 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.463e+01 8.962e+01 9.637e+01 1.023e+02 1.658e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-27 22:08:11,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3247693.3333333335, ans=0.0 2023-11-27 22:08:11,802 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6200, loss[loss=0.08149, simple_loss=0.1057, pruned_loss=0.02035, audio_tagging_loss=0.008301, over 15381.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09061, pruned_loss=0.01244, audio_tagging_loss=0.008751, over 3041883.48 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:08:56,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3247893.3333333335, ans=0.05 2023-11-27 22:08:59,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3247893.3333333335, ans=0.0 2023-11-27 22:09:09,965 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487200 2023-11-27 22:09:17,731 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6250, loss[loss=0.07435, simple_loss=0.1047, pruned_loss=0.01578, audio_tagging_loss=0.006218, over 14249.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08949, pruned_loss=0.01221, audio_tagging_loss=0.008764, over 3035774.00 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:09:44,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3248160.0, ans=0.0 2023-11-27 22:09:47,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3248160.0, ans=0.125 2023-11-27 22:09:47,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3248160.0, ans=0.025 2023-11-27 22:09:48,357 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2023-11-27 22:09:57,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3248226.6666666665, ans=0.0 2023-11-27 22:10:02,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3248226.6666666665, ans=0.07 2023-11-27 22:10:10,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3248293.3333333335, ans=0.125 2023-11-27 22:10:15,333 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487250 2023-11-27 22:10:17,265 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.680e+01 9.045e+01 9.912e+01 1.334e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-27 22:10:21,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3248293.3333333335, ans=0.0 2023-11-27 22:10:23,268 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6300, loss[loss=0.04774, simple_loss=0.06515, pruned_loss=0.004583, audio_tagging_loss=0.01058, over 15404.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08983, pruned_loss=0.01225, audio_tagging_loss=0.008833, over 3030857.20 frames. ], batch size: 63, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:10:29,802 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2023-11-27 22:10:30,899 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2023-11-27 22:10:32,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3248360.0, ans=0.1 2023-11-27 22:11:18,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3248626.6666666665, ans=0.1 2023-11-27 22:11:20,390 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487300 2023-11-27 22:11:22,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3248626.6666666665, ans=0.0 2023-11-27 22:11:27,389 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6350, loss[loss=0.06178, simple_loss=0.07791, pruned_loss=0.01152, audio_tagging_loss=0.0113, over 16337.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09023, pruned_loss=0.01238, audio_tagging_loss=0.008892, over 3037287.54 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:11:31,512 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.74 vs. limit=15.0 2023-11-27 22:11:32,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.42 vs. limit=10.0 2023-11-27 22:11:54,478 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.54 vs. limit=15.0 2023-11-27 22:12:00,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3248826.6666666665, ans=0.1 2023-11-27 22:12:03,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3248826.6666666665, ans=0.125 2023-11-27 22:12:11,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3248893.3333333335, ans=0.125 2023-11-27 22:12:14,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3248893.3333333335, ans=0.125 2023-11-27 22:12:16,925 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.29 vs. limit=10.0 2023-11-27 22:12:23,567 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487350 2023-11-27 22:12:24,704 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.655e+01 9.162e+01 9.797e+01 1.327e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 22:12:28,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3248960.0, ans=0.125 2023-11-27 22:12:31,044 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6400, loss[loss=0.0745, simple_loss=0.09691, pruned_loss=0.01843, audio_tagging_loss=0.00762, over 15420.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.0899, pruned_loss=0.01254, audio_tagging_loss=0.008965, over 3031058.75 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 22:12:40,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3249026.6666666665, ans=0.125 2023-11-27 22:12:41,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3249026.6666666665, ans=0.125 2023-11-27 22:12:58,949 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:13:15,384 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.80 vs. limit=22.5 2023-11-27 22:13:18,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3249226.6666666665, ans=0.125 2023-11-27 22:13:28,974 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487400 2023-11-27 22:13:36,649 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6450, loss[loss=0.05731, simple_loss=0.07648, pruned_loss=0.009217, audio_tagging_loss=0.009857, over 15134.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09044, pruned_loss=0.01248, audio_tagging_loss=0.008966, over 3045423.21 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 22:13:42,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3249360.0, ans=0.125 2023-11-27 22:14:34,876 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487450 2023-11-27 22:14:34,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3249626.6666666665, ans=0.125 2023-11-27 22:14:37,160 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.688e+01 9.330e+01 9.887e+01 1.158e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-27 22:14:42,180 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6500, loss[loss=0.06532, simple_loss=0.0883, pruned_loss=0.01107, audio_tagging_loss=0.0101, over 14583.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09065, pruned_loss=0.01244, audio_tagging_loss=0.00902, over 3045524.19 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:15:29,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3249893.3333333335, ans=10.0 2023-11-27 22:15:38,354 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487500 2023-11-27 22:15:45,981 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6550, loss[loss=0.06664, simple_loss=0.09224, pruned_loss=0.01375, audio_tagging_loss=0.006776, over 15239.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09137, pruned_loss=0.01265, audio_tagging_loss=0.008888, over 3051161.07 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:16:30,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3250226.6666666665, ans=0.04949747468305833 2023-11-27 22:16:39,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3250293.3333333335, ans=0.0 2023-11-27 22:16:43,003 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487550 2023-11-27 22:16:45,721 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.718e+01 9.335e+01 9.836e+01 1.577e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 22:16:47,248 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:16:50,680 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2023-11-27 22:16:51,216 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6600, loss[loss=0.04507, simple_loss=0.05475, pruned_loss=0.005389, audio_tagging_loss=0.01231, over 15945.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09133, pruned_loss=0.01264, audio_tagging_loss=0.008769, over 3046059.64 frames. ], batch size: 63, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:17:21,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3250493.3333333335, ans=0.95 2023-11-27 22:17:24,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3250493.3333333335, ans=0.0 2023-11-27 22:17:42,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3250626.6666666665, ans=0.0 2023-11-27 22:17:43,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3250626.6666666665, ans=0.1 2023-11-27 22:17:44,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3250626.6666666665, ans=0.125 2023-11-27 22:17:44,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3250626.6666666665, ans=0.0 2023-11-27 22:17:47,871 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487600 2023-11-27 22:17:56,528 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6650, loss[loss=0.0661, simple_loss=0.08672, pruned_loss=0.01526, audio_tagging_loss=0.007472, over 15328.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08972, pruned_loss=0.01251, audio_tagging_loss=0.008745, over 3045180.39 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:18:05,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3250693.3333333335, ans=0.125 2023-11-27 22:18:19,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3250826.6666666665, ans=0.0 2023-11-27 22:18:38,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3250893.3333333335, ans=0.125 2023-11-27 22:18:50,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3250960.0, ans=0.0 2023-11-27 22:18:53,012 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487650 2023-11-27 22:18:55,277 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.323e+01 8.676e+01 9.213e+01 9.807e+01 1.195e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-27 22:18:56,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3250960.0, ans=0.0 2023-11-27 22:18:56,783 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:19:00,119 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6700, loss[loss=0.07605, simple_loss=0.101, pruned_loss=0.01573, audio_tagging_loss=0.009807, over 15007.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09056, pruned_loss=0.01244, audio_tagging_loss=0.008693, over 3044438.47 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:19:07,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3251026.6666666665, ans=0.125 2023-11-27 22:19:13,357 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=12.0 2023-11-27 22:19:30,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3251160.0, ans=0.5 2023-11-27 22:19:40,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3251226.6666666665, ans=0.2 2023-11-27 22:19:52,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3251293.3333333335, ans=0.05 2023-11-27 22:19:56,265 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487700 2023-11-27 22:19:59,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3251293.3333333335, ans=0.125 2023-11-27 22:20:04,249 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6750, loss[loss=0.06287, simple_loss=0.08568, pruned_loss=0.01077, audio_tagging_loss=0.009261, over 14002.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09049, pruned_loss=0.01255, audio_tagging_loss=0.008721, over 3037380.78 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:20:05,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3251360.0, ans=0.0 2023-11-27 22:20:40,494 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.73 vs. limit=15.0 2023-11-27 22:20:59,968 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487750 2023-11-27 22:21:02,150 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.521e+01 8.663e+01 9.320e+01 9.783e+01 1.430e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-27 22:21:02,886 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.67 vs. limit=6.0 2023-11-27 22:21:07,719 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6800, loss[loss=0.06414, simple_loss=0.08965, pruned_loss=0.01101, audio_tagging_loss=0.008301, over 15344.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.0904, pruned_loss=0.01246, audio_tagging_loss=0.0087, over 3035845.50 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 22:21:13,121 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.56 vs. limit=12.0 2023-11-27 22:21:57,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3251960.0, ans=0.0 2023-11-27 22:22:01,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3251960.0, ans=0.125 2023-11-27 22:22:03,489 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487800 2023-11-27 22:22:11,465 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6850, loss[loss=0.07345, simple_loss=0.1035, pruned_loss=0.01433, audio_tagging_loss=0.007394, over 15559.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08942, pruned_loss=0.01225, audio_tagging_loss=0.008621, over 3037004.81 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 22:22:39,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3252160.0, ans=0.2 2023-11-27 22:22:58,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3252226.6666666665, ans=0.0 2023-11-27 22:23:06,686 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487850 2023-11-27 22:23:10,108 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.119e+01 8.804e+01 9.259e+01 1.010e+02 1.279e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-27 22:23:10,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3252293.3333333335, ans=0.0 2023-11-27 22:23:11,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.03 vs. limit=12.0 2023-11-27 22:23:14,175 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6900, loss[loss=0.07141, simple_loss=0.09302, pruned_loss=0.01545, audio_tagging_loss=0.00945, over 15488.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08963, pruned_loss=0.01231, audio_tagging_loss=0.00863, over 3043355.69 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:23:25,770 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-27 22:23:31,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3252426.6666666665, ans=0.09899494936611666 2023-11-27 22:23:40,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3252493.3333333335, ans=0.0 2023-11-27 22:23:42,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3252493.3333333335, ans=0.125 2023-11-27 22:23:50,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3252560.0, ans=0.2 2023-11-27 22:23:54,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3252560.0, ans=0.0 2023-11-27 22:23:57,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3252560.0, ans=0.2 2023-11-27 22:24:00,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3252560.0, ans=0.2 2023-11-27 22:24:03,547 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 22:24:03,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3252626.6666666665, ans=0.125 2023-11-27 22:24:07,488 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487900 2023-11-27 22:24:14,299 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6950, loss[loss=0.06938, simple_loss=0.1024, pruned_loss=0.01038, audio_tagging_loss=0.007791, over 14883.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08984, pruned_loss=0.01239, audio_tagging_loss=0.008598, over 3044232.29 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:24:20,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3252693.3333333335, ans=0.125 2023-11-27 22:24:34,433 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.55 vs. limit=12.0 2023-11-27 22:24:57,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3252893.3333333335, ans=0.0 2023-11-27 22:24:58,994 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2023-11-27 22:25:11,254 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487950 2023-11-27 22:25:17,150 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.695e+01 9.327e+01 1.020e+02 1.737e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 22:25:17,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3252960.0, ans=0.0 2023-11-27 22:25:22,606 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7000, loss[loss=0.07933, simple_loss=0.1112, pruned_loss=0.01674, audio_tagging_loss=0.007001, over 16725.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08983, pruned_loss=0.01243, audio_tagging_loss=0.008708, over 3046772.47 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:27:46,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3253226.6666666665, ans=0.0 2023-11-27 22:27:47,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3253226.6666666665, ans=0.125 2023-11-27 22:28:42,209 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488000 2023-11-27 22:28:46,270 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-488000.pt 2023-11-27 22:29:17,672 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7050, loss[loss=0.0829, simple_loss=0.1169, pruned_loss=0.01681, audio_tagging_loss=0.007628, over 16658.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08934, pruned_loss=0.01244, audio_tagging_loss=0.0088, over 3052753.97 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 8.0 2023-11-27 22:29:45,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3253360.0, ans=0.0 2023-11-27 22:30:39,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3253426.6666666665, ans=0.125 2023-11-27 22:31:10,119 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.94 vs. limit=22.5 2023-11-27 22:32:21,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3253626.6666666665, ans=0.0 2023-11-27 22:32:29,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3253626.6666666665, ans=0.125 2023-11-27 22:32:41,856 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488050 2023-11-27 22:33:03,870 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.365e+01 8.458e+01 9.244e+01 1.037e+02 2.754e+02, threshold=1.849e+02, percent-clipped=1.0 2023-11-27 22:33:16,782 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7100, loss[loss=0.08804, simple_loss=0.1274, pruned_loss=0.01604, audio_tagging_loss=0.008313, over 15528.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09016, pruned_loss=0.01263, audio_tagging_loss=0.008852, over 3050341.12 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 8.0 2023-11-27 22:34:25,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3253760.0, ans=0.2 2023-11-27 22:35:03,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3253826.6666666665, ans=0.025 2023-11-27 22:35:29,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3253826.6666666665, ans=0.1 2023-11-27 22:36:09,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3253893.3333333335, ans=0.125 2023-11-27 22:36:47,431 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488100 2023-11-27 22:37:13,759 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7150, loss[loss=0.09155, simple_loss=0.1193, pruned_loss=0.02307, audio_tagging_loss=0.008839, over 15044.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08955, pruned_loss=0.01256, audio_tagging_loss=0.008843, over 3047182.44 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 8.0 2023-11-27 22:38:28,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3254093.3333333335, ans=0.05 2023-11-27 22:39:24,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3254226.6666666665, ans=0.1 2023-11-27 22:39:32,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3254226.6666666665, ans=0.125 2023-11-27 22:39:43,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3254226.6666666665, ans=0.125 2023-11-27 22:40:15,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3254293.3333333335, ans=0.125 2023-11-27 22:40:18,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3254293.3333333335, ans=0.125 2023-11-27 22:40:28,524 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488150 2023-11-27 22:40:50,171 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.853e+01 8.864e+01 9.452e+01 1.007e+02 1.551e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 22:41:01,892 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7200, loss[loss=0.06263, simple_loss=0.08427, pruned_loss=0.01254, audio_tagging_loss=0.007963, over 15468.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08988, pruned_loss=0.01258, audio_tagging_loss=0.0089, over 3048307.94 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:41:47,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3254426.6666666665, ans=0.125 2023-11-27 22:42:42,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3254493.3333333335, ans=0.1 2023-11-27 22:43:13,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3254560.0, ans=0.0 2023-11-27 22:43:27,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3254626.6666666665, ans=0.0 2023-11-27 22:43:41,857 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=22.5 2023-11-27 22:43:47,025 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488200 2023-11-27 22:44:01,351 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=12.0 2023-11-27 22:44:03,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3254626.6666666665, ans=0.1 2023-11-27 22:44:08,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3254693.3333333335, ans=0.125 2023-11-27 22:44:10,585 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7250, loss[loss=0.08854, simple_loss=0.1245, pruned_loss=0.01917, audio_tagging_loss=0.007106, over 15041.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09025, pruned_loss=0.01262, audio_tagging_loss=0.008941, over 3048296.32 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:44:40,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3254760.0, ans=0.015 2023-11-27 22:44:56,993 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.70 vs. limit=15.0 2023-11-27 22:45:38,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3254826.6666666665, ans=0.0 2023-11-27 22:45:49,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.42 vs. limit=10.0 2023-11-27 22:46:29,943 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488250 2023-11-27 22:46:33,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3254960.0, ans=0.125 2023-11-27 22:46:40,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3254960.0, ans=0.125 2023-11-27 22:46:41,983 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.745e+01 9.249e+01 1.003e+02 1.162e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-27 22:46:44,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3255026.6666666665, ans=0.1 2023-11-27 22:46:46,605 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=15.0 2023-11-27 22:46:47,110 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7300, loss[loss=0.06798, simple_loss=0.08296, pruned_loss=0.01682, audio_tagging_loss=0.009682, over 13654.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09031, pruned_loss=0.01252, audio_tagging_loss=0.008833, over 3048923.85 frames. ], batch size: 52, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:46:52,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3255026.6666666665, ans=0.0 2023-11-27 22:47:00,905 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:47:17,396 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2023-11-27 22:47:33,765 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.32 vs. limit=6.0 2023-11-27 22:48:37,702 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=22.5 2023-11-27 22:48:46,868 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.92 vs. limit=15.0 2023-11-27 22:49:01,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3255293.3333333335, ans=0.0 2023-11-27 22:49:05,324 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488300 2023-11-27 22:49:24,621 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7350, loss[loss=0.05406, simple_loss=0.07611, pruned_loss=0.008065, audio_tagging_loss=0.007943, over 15665.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.0907, pruned_loss=0.01256, audio_tagging_loss=0.008663, over 3050869.02 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:49:25,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3255360.0, ans=0.0 2023-11-27 22:49:36,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3255360.0, ans=0.125 2023-11-27 22:49:40,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3255360.0, ans=0.0 2023-11-27 22:49:43,274 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2023-11-27 22:49:49,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3255360.0, ans=0.125 2023-11-27 22:49:51,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3255360.0, ans=0.1 2023-11-27 22:51:43,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3255626.6666666665, ans=0.125 2023-11-27 22:52:03,491 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488350 2023-11-27 22:52:15,285 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.717e+01 9.286e+01 1.027e+02 1.219e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 22:52:20,924 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7400, loss[loss=0.0501, simple_loss=0.07201, pruned_loss=0.006133, audio_tagging_loss=0.007964, over 14730.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08943, pruned_loss=0.01224, audio_tagging_loss=0.008651, over 3046538.43 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:52:52,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3255760.0, ans=0.0 2023-11-27 22:53:54,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3255826.6666666665, ans=0.125 2023-11-27 22:54:09,026 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-27 22:54:28,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3255893.3333333335, ans=0.0 2023-11-27 22:55:00,485 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488400 2023-11-27 22:55:21,743 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7450, loss[loss=0.06287, simple_loss=0.09317, pruned_loss=0.009918, audio_tagging_loss=0.006369, over 16122.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09021, pruned_loss=0.01237, audio_tagging_loss=0.0085, over 3041907.87 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:56:30,603 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.30 vs. limit=12.0 2023-11-27 22:57:18,207 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.99 vs. limit=15.0 2023-11-27 22:57:22,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3256226.6666666665, ans=0.025 2023-11-27 22:57:48,501 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488450 2023-11-27 22:57:59,912 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.653e+01 9.263e+01 9.964e+01 1.295e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-27 22:58:07,690 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7500, loss[loss=0.06123, simple_loss=0.08221, pruned_loss=0.01135, audio_tagging_loss=0.008777, over 15027.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.0903, pruned_loss=0.01241, audio_tagging_loss=0.008489, over 3045915.58 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:58:22,866 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.26 vs. limit=10.0 2023-11-27 22:58:40,664 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.48 vs. limit=15.0 2023-11-27 23:00:10,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3256560.0, ans=0.09899494936611666 2023-11-27 23:00:23,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3256626.6666666665, ans=0.125 2023-11-27 23:00:28,769 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.96 vs. limit=15.0 2023-11-27 23:00:42,809 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488500 2023-11-27 23:01:02,820 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7550, loss[loss=0.07124, simple_loss=0.09548, pruned_loss=0.01425, audio_tagging_loss=0.009256, over 15552.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.09026, pruned_loss=0.0124, audio_tagging_loss=0.008426, over 3047224.44 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:01:07,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3256693.3333333335, ans=0.125 2023-11-27 23:02:28,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3256826.6666666665, ans=0.125 2023-11-27 23:02:28,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3256826.6666666665, ans=10.0 2023-11-27 23:02:39,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.52 vs. limit=15.0 2023-11-27 23:03:18,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3256960.0, ans=0.0 2023-11-27 23:03:30,957 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488550 2023-11-27 23:03:43,872 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 8.710e+01 9.273e+01 1.023e+02 1.229e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 23:03:49,431 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7600, loss[loss=0.08825, simple_loss=0.1128, pruned_loss=0.02332, audio_tagging_loss=0.008526, over 15193.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08968, pruned_loss=0.01237, audio_tagging_loss=0.008484, over 3049300.67 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:04:15,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3257026.6666666665, ans=0.125 2023-11-27 23:04:58,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3257160.0, ans=0.025 2023-11-27 23:05:56,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3257293.3333333335, ans=0.125 2023-11-27 23:06:07,468 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488600 2023-11-27 23:06:18,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3257293.3333333335, ans=0.0 2023-11-27 23:06:26,641 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7650, loss[loss=0.06245, simple_loss=0.08887, pruned_loss=0.007729, audio_tagging_loss=0.01029, over 16035.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.0898, pruned_loss=0.01235, audio_tagging_loss=0.008562, over 3044186.65 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:06:27,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3257360.0, ans=0.125 2023-11-27 23:06:47,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3257360.0, ans=0.2 2023-11-27 23:06:47,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3257360.0, ans=0.125 2023-11-27 23:07:24,825 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.23 vs. limit=22.5 2023-11-27 23:07:43,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3257493.3333333335, ans=0.125 2023-11-27 23:08:03,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.41 vs. limit=15.0 2023-11-27 23:08:37,267 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488650 2023-11-27 23:08:50,845 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 8.807e+01 9.447e+01 1.017e+02 1.729e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-27 23:08:53,530 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7700, loss[loss=0.06731, simple_loss=0.1011, pruned_loss=0.009021, audio_tagging_loss=0.007745, over 15982.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08993, pruned_loss=0.01239, audio_tagging_loss=0.008548, over 3045257.14 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:09:50,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3257826.6666666665, ans=0.1 2023-11-27 23:10:02,402 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 23:10:08,781 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2023-11-27 23:10:20,363 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=15.0 2023-11-27 23:10:50,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3257960.0, ans=0.1 2023-11-27 23:10:55,084 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488700 2023-11-27 23:11:17,042 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7750, loss[loss=0.07425, simple_loss=0.0995, pruned_loss=0.01619, audio_tagging_loss=0.008303, over 15169.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08952, pruned_loss=0.01224, audio_tagging_loss=0.008677, over 3045128.05 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:11:35,819 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.95 vs. limit=22.5 2023-11-27 23:13:42,446 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488750 2023-11-27 23:13:55,309 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.544e+01 8.828e+01 9.509e+01 1.004e+02 1.323e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 23:13:57,956 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7800, loss[loss=0.05695, simple_loss=0.07864, pruned_loss=0.01005, audio_tagging_loss=0.007587, over 15928.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08988, pruned_loss=0.01238, audio_tagging_loss=0.008674, over 3043969.77 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:14:28,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3258426.6666666665, ans=0.125 2023-11-27 23:14:32,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3258426.6666666665, ans=0.2 2023-11-27 23:15:16,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3258560.0, ans=0.0 2023-11-27 23:15:20,959 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.45 vs. limit=15.0 2023-11-27 23:15:50,795 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488800 2023-11-27 23:16:08,545 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7850, loss[loss=0.04355, simple_loss=0.06172, pruned_loss=0.00538, audio_tagging_loss=0.007309, over 14039.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08943, pruned_loss=0.01227, audio_tagging_loss=0.008766, over 3045961.56 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:16:23,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3258693.3333333335, ans=0.2 2023-11-27 23:16:34,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3258760.0, ans=0.125 2023-11-27 23:16:50,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3258760.0, ans=0.0 2023-11-27 23:16:57,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3258826.6666666665, ans=0.2 2023-11-27 23:17:14,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3258826.6666666665, ans=0.1 2023-11-27 23:17:23,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.98 vs. limit=15.0 2023-11-27 23:17:39,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3258893.3333333335, ans=0.2 2023-11-27 23:17:52,037 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488850 2023-11-27 23:18:02,665 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.284e+01 8.704e+01 9.225e+01 1.001e+02 1.986e+02, threshold=1.845e+02, percent-clipped=1.0 2023-11-27 23:18:06,431 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7900, loss[loss=0.07825, simple_loss=0.1149, pruned_loss=0.01311, audio_tagging_loss=0.00768, over 14544.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09023, pruned_loss=0.01245, audio_tagging_loss=0.008871, over 3056965.76 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:19:02,521 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.52 vs. limit=15.0 2023-11-27 23:19:09,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3259160.0, ans=0.1 2023-11-27 23:19:30,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3259226.6666666665, ans=0.0 2023-11-27 23:19:48,568 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488900 2023-11-27 23:19:55,798 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.81 vs. limit=12.0 2023-11-27 23:20:01,495 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7950, loss[loss=0.07915, simple_loss=0.105, pruned_loss=0.01534, audio_tagging_loss=0.01132, over 15742.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.0896, pruned_loss=0.01235, audio_tagging_loss=0.00908, over 3050143.40 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:20:30,316 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 23:20:55,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3259493.3333333335, ans=0.95 2023-11-27 23:21:20,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3259626.6666666665, ans=0.0 2023-11-27 23:21:29,026 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488950 2023-11-27 23:21:31,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2023-11-27 23:21:37,677 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.047e+01 8.632e+01 9.434e+01 1.008e+02 1.251e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 23:21:39,771 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8000, loss[loss=0.04703, simple_loss=0.05155, pruned_loss=0.009186, audio_tagging_loss=0.01207, over 17044.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08822, pruned_loss=0.0121, audio_tagging_loss=0.009205, over 3043383.13 frames. ], batch size: 67, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:21:50,588 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=15.0 2023-11-27 23:22:41,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3259893.3333333335, ans=0.125 2023-11-27 23:23:00,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3259960.0, ans=0.2 2023-11-27 23:23:06,227 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489000 2023-11-27 23:23:14,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3259960.0, ans=0.0 2023-11-27 23:23:17,795 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8050, loss[loss=0.06703, simple_loss=0.09006, pruned_loss=0.01453, audio_tagging_loss=0.007462, over 14607.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.088, pruned_loss=0.01219, audio_tagging_loss=0.009186, over 3043567.79 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:23:47,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3260093.3333333335, ans=0.0 2023-11-27 23:24:22,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3260226.6666666665, ans=0.0 2023-11-27 23:24:34,344 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.19 vs. limit=15.0 2023-11-27 23:24:35,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3260293.3333333335, ans=0.1 2023-11-27 23:24:40,406 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489050 2023-11-27 23:24:41,669 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.66 vs. limit=22.5 2023-11-27 23:24:48,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3260293.3333333335, ans=0.0 2023-11-27 23:24:49,240 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.685e+01 9.405e+01 9.974e+01 1.162e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-27 23:24:50,890 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8100, loss[loss=0.05475, simple_loss=0.07344, pruned_loss=0.009589, audio_tagging_loss=0.008445, over 14508.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08866, pruned_loss=0.01221, audio_tagging_loss=0.009039, over 3042891.05 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:25:35,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3260493.3333333335, ans=0.125 2023-11-27 23:25:43,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3260493.3333333335, ans=0.0 2023-11-27 23:26:13,622 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489100 2023-11-27 23:26:24,339 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8150, loss[loss=0.0753, simple_loss=0.09837, pruned_loss=0.01581, audio_tagging_loss=0.01031, over 14465.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.0899, pruned_loss=0.01239, audio_tagging_loss=0.008845, over 3044211.64 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:26:26,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3260693.3333333335, ans=0.2 2023-11-27 23:27:04,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3260826.6666666665, ans=0.125 2023-11-27 23:27:04,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.69 vs. limit=15.0 2023-11-27 23:27:43,024 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489150 2023-11-27 23:27:50,597 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.874e+01 9.379e+01 1.019e+02 1.298e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-27 23:27:52,132 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8200, loss[loss=0.0645, simple_loss=0.09571, pruned_loss=0.009177, audio_tagging_loss=0.007467, over 15839.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09048, pruned_loss=0.01248, audio_tagging_loss=0.008744, over 3041413.51 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:27:56,815 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 23:28:12,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3261093.3333333335, ans=0.2 2023-11-27 23:28:31,282 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.48 vs. limit=22.5 2023-11-27 23:28:42,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3261226.6666666665, ans=0.1 2023-11-27 23:29:04,216 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489200 2023-11-27 23:29:08,465 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-27 23:29:11,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3261293.3333333335, ans=0.0 2023-11-27 23:29:13,168 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8250, loss[loss=0.0499, simple_loss=0.06711, pruned_loss=0.007653, audio_tagging_loss=0.00869, over 16304.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08961, pruned_loss=0.01229, audio_tagging_loss=0.008731, over 3044796.35 frames. ], batch size: 62, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:29:26,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3261426.6666666665, ans=0.1 2023-11-27 23:29:46,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3261493.3333333335, ans=0.0 2023-11-27 23:30:18,193 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489250 2023-11-27 23:30:27,235 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.905e+01 9.510e+01 1.029e+02 2.089e+02, threshold=1.902e+02, percent-clipped=1.0 2023-11-27 23:30:27,283 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8300, loss[loss=0.08705, simple_loss=0.1317, pruned_loss=0.01547, audio_tagging_loss=0.005704, over 16021.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09074, pruned_loss=0.01252, audio_tagging_loss=0.008691, over 3049808.02 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-27 23:30:30,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3261693.3333333335, ans=0.125 2023-11-27 23:30:43,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3261760.0, ans=0.2 2023-11-27 23:30:53,429 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=12.0 2023-11-27 23:31:04,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3261826.6666666665, ans=0.125 2023-11-27 23:31:08,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3261893.3333333335, ans=0.09899494936611666 2023-11-27 23:31:09,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3261893.3333333335, ans=0.1 2023-11-27 23:31:22,384 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.95 vs. limit=15.0 2023-11-27 23:31:27,126 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489300 2023-11-27 23:31:29,167 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=22.5 2023-11-27 23:31:35,077 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8350, loss[loss=0.07139, simple_loss=0.09904, pruned_loss=0.01561, audio_tagging_loss=0.00627, over 14599.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09138, pruned_loss=0.01274, audio_tagging_loss=0.00857, over 3050322.98 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-27 23:32:33,316 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489350 2023-11-27 23:32:46,838 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.332e+01 8.783e+01 9.379e+01 1.006e+02 1.235e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-27 23:32:46,924 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8400, loss[loss=0.06085, simple_loss=0.07985, pruned_loss=0.009329, audio_tagging_loss=0.01159, over 15047.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09164, pruned_loss=0.01285, audio_tagging_loss=0.008513, over 3054528.40 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:33:17,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3262426.6666666665, ans=0.0 2023-11-27 23:33:51,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.48 vs. limit=12.0 2023-11-27 23:34:06,428 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-27 23:34:51,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3262493.3333333335, ans=0.0 2023-11-27 23:35:04,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3262493.3333333335, ans=0.0 2023-11-27 23:36:28,336 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489400 2023-11-27 23:36:45,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3262626.6666666665, ans=0.125 2023-11-27 23:36:54,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3262693.3333333335, ans=0.0 2023-11-27 23:36:59,258 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8450, loss[loss=0.05038, simple_loss=0.06571, pruned_loss=0.006813, audio_tagging_loss=0.01071, over 15082.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.0917, pruned_loss=0.01282, audio_tagging_loss=0.008523, over 3055605.95 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:37:31,760 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 23:37:41,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3262760.0, ans=0.125 2023-11-27 23:37:46,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3262760.0, ans=0.125 2023-11-27 23:38:18,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3262760.0, ans=0.125 2023-11-27 23:38:26,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3262826.6666666665, ans=0.125 2023-11-27 23:38:52,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3262826.6666666665, ans=0.0 2023-11-27 23:39:33,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3262893.3333333335, ans=0.0 2023-11-27 23:40:20,274 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489450 2023-11-27 23:40:35,252 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.99 vs. limit=15.0 2023-11-27 23:40:53,217 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 8.777e+01 9.452e+01 1.015e+02 1.471e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 23:40:53,288 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8500, loss[loss=0.0622, simple_loss=0.07968, pruned_loss=0.01447, audio_tagging_loss=0.007888, over 14568.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09165, pruned_loss=0.01281, audio_tagging_loss=0.008525, over 3050573.81 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:41:52,645 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.25 vs. limit=10.0 2023-11-27 23:42:26,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3263160.0, ans=0.1 2023-11-27 23:43:03,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3263160.0, ans=0.125 2023-11-27 23:44:18,719 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489500 2023-11-27 23:44:19,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3263293.3333333335, ans=0.1 2023-11-27 23:44:43,343 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8550, loss[loss=0.06766, simple_loss=0.08636, pruned_loss=0.01621, audio_tagging_loss=0.008263, over 14574.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09036, pruned_loss=0.01264, audio_tagging_loss=0.008588, over 3041509.18 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:44:56,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3263360.0, ans=0.0 2023-11-27 23:44:59,205 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.72 vs. limit=12.0 2023-11-27 23:45:06,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2023-11-27 23:45:24,968 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=22.5 2023-11-27 23:46:25,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3263626.6666666665, ans=0.125 2023-11-27 23:46:33,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3263626.6666666665, ans=0.0 2023-11-27 23:46:35,725 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489550 2023-11-27 23:46:50,779 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 8.830e+01 9.577e+01 1.042e+02 1.217e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 23:46:50,837 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8600, loss[loss=0.07696, simple_loss=0.1115, pruned_loss=0.01384, audio_tagging_loss=0.007392, over 15860.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09066, pruned_loss=0.01259, audio_tagging_loss=0.008615, over 3048727.52 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:46:51,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3263693.3333333335, ans=0.125 2023-11-27 23:46:53,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3263693.3333333335, ans=0.0 2023-11-27 23:47:25,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.78 vs. limit=15.0 2023-11-27 23:48:11,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3263893.3333333335, ans=0.125 2023-11-27 23:48:16,273 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2023-11-27 23:48:39,256 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489600 2023-11-27 23:48:54,567 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8650, loss[loss=0.07247, simple_loss=0.1053, pruned_loss=0.01134, audio_tagging_loss=0.008505, over 15384.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09008, pruned_loss=0.01246, audio_tagging_loss=0.008745, over 3044957.20 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:49:00,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3264026.6666666665, ans=0.125 2023-11-27 23:49:12,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3264026.6666666665, ans=0.2 2023-11-27 23:49:42,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3264093.3333333335, ans=0.125 2023-11-27 23:49:54,831 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.13 vs. limit=15.0 2023-11-27 23:50:10,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3264226.6666666665, ans=0.125 2023-11-27 23:50:44,176 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489650 2023-11-27 23:50:44,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3264293.3333333335, ans=0.0 2023-11-27 23:50:58,366 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.714e+01 8.922e+01 9.759e+01 1.039e+02 1.261e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-27 23:50:58,441 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8700, loss[loss=0.07441, simple_loss=0.1087, pruned_loss=0.01431, audio_tagging_loss=0.005738, over 15045.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08945, pruned_loss=0.0125, audio_tagging_loss=0.008856, over 3044624.92 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:51:00,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3264360.0, ans=0.0 2023-11-27 23:52:08,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3264493.3333333335, ans=0.125 2023-11-27 23:52:15,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3264560.0, ans=0.125 2023-11-27 23:52:30,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3264560.0, ans=0.0 2023-11-27 23:52:39,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3264626.6666666665, ans=0.0 2023-11-27 23:52:47,545 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489700 2023-11-27 23:52:52,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3264626.6666666665, ans=0.125 2023-11-27 23:53:01,357 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8750, loss[loss=0.07799, simple_loss=0.1068, pruned_loss=0.01499, audio_tagging_loss=0.00959, over 14715.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09066, pruned_loss=0.01261, audio_tagging_loss=0.008926, over 3048491.37 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:53:19,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3264693.3333333335, ans=0.125 2023-11-27 23:54:23,033 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.605e-03 2023-11-27 23:54:51,533 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489750 2023-11-27 23:55:06,017 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.769e+01 9.393e+01 1.008e+02 1.168e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 23:55:06,078 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8800, loss[loss=0.06322, simple_loss=0.08996, pruned_loss=0.01132, audio_tagging_loss=0.006931, over 15178.00 frames. ], tot_loss[loss=0.0675, simple_loss=0.09155, pruned_loss=0.01275, audio_tagging_loss=0.008975, over 3047677.10 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:55:28,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3265093.3333333335, ans=0.125 2023-11-27 23:56:52,363 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489800 2023-11-27 23:56:59,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3265293.3333333335, ans=0.125 2023-11-27 23:57:00,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.37 vs. limit=22.5 2023-11-27 23:57:02,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3265293.3333333335, ans=0.0 2023-11-27 23:57:07,583 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8850, loss[loss=0.07639, simple_loss=0.1204, pruned_loss=0.01074, audio_tagging_loss=0.005453, over 16270.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09141, pruned_loss=0.01252, audio_tagging_loss=0.008947, over 3054888.64 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-27 23:57:35,294 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 23:57:45,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3265426.6666666665, ans=0.125 2023-11-27 23:57:48,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=3265426.6666666665, ans=12.0 2023-11-27 23:58:51,162 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489850 2023-11-27 23:59:03,441 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8900, loss[loss=0.05625, simple_loss=0.06493, pruned_loss=0.01142, audio_tagging_loss=0.01237, over 15292.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09069, pruned_loss=0.01247, audio_tagging_loss=0.008858, over 3055889.01 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 16.0 2023-11-27 23:59:05,824 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 8.603e+01 9.158e+01 9.792e+01 1.158e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 23:59:06,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3265693.3333333335, ans=0.125 2023-11-27 23:59:09,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3265693.3333333335, ans=0.125 2023-11-27 23:59:29,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3265760.0, ans=0.125 2023-11-28 00:00:23,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3265893.3333333335, ans=0.125 2023-11-28 00:00:56,406 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489900 2023-11-28 00:01:10,720 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8950, loss[loss=0.07845, simple_loss=0.1079, pruned_loss=0.01567, audio_tagging_loss=0.008829, over 14973.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09115, pruned_loss=0.01261, audio_tagging_loss=0.008674, over 3054685.39 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:01:59,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3266160.0, ans=0.2 2023-11-28 00:02:32,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3266226.6666666665, ans=0.0 2023-11-28 00:02:48,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3266293.3333333335, ans=0.0 2023-11-28 00:02:57,017 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489950 2023-11-28 00:03:10,782 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9000, loss[loss=0.06673, simple_loss=0.09766, pruned_loss=0.01145, audio_tagging_loss=0.006458, over 15201.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09023, pruned_loss=0.01234, audio_tagging_loss=0.008625, over 3053242.73 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:03:10,785 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 00:04:14,795 INFO [train_asr.py:1267] (0/4) Epoch 41, validation: loss=0.05835, simple_loss=0.05061, pruned_loss=0.005195, audio_tagging_loss=0.02785, over 4681554.00 frames. 2023-11-28 00:04:14,808 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 00:04:16,749 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.840e+01 9.454e+01 9.905e+01 1.337e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-28 00:04:17,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3266360.0, ans=0.125 2023-11-28 00:04:19,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3266360.0, ans=0.125 2023-11-28 00:04:21,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3266360.0, ans=0.05 2023-11-28 00:04:26,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3266360.0, ans=0.0 2023-11-28 00:05:54,211 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.55 vs. limit=15.0 2023-11-28 00:06:02,734 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490000 2023-11-28 00:06:12,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3266626.6666666665, ans=0.1 2023-11-28 00:06:17,831 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9050, loss[loss=0.07726, simple_loss=0.1091, pruned_loss=0.01643, audio_tagging_loss=0.00628, over 15193.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09073, pruned_loss=0.01247, audio_tagging_loss=0.008675, over 3049903.90 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:06:35,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3266693.3333333335, ans=0.0 2023-11-28 00:06:57,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3266760.0, ans=0.0 2023-11-28 00:07:06,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3266826.6666666665, ans=0.125 2023-11-28 00:07:13,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.12 vs. limit=10.0 2023-11-28 00:07:21,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.01 vs. limit=15.0 2023-11-28 00:08:05,110 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490050 2023-11-28 00:08:19,709 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9100, loss[loss=0.05516, simple_loss=0.06999, pruned_loss=0.009044, audio_tagging_loss=0.01112, over 15334.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08908, pruned_loss=0.01229, audio_tagging_loss=0.008733, over 3056511.19 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:08:22,051 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 8.819e+01 9.395e+01 1.013e+02 1.222e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-28 00:08:52,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3267093.3333333335, ans=0.04949747468305833 2023-11-28 00:09:15,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3267160.0, ans=0.125 2023-11-28 00:10:04,192 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490100 2023-11-28 00:10:17,948 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9150, loss[loss=0.05764, simple_loss=0.08593, pruned_loss=0.006219, audio_tagging_loss=0.008456, over 15695.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08929, pruned_loss=0.01235, audio_tagging_loss=0.008616, over 3052324.19 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:11:20,222 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2023-11-28 00:11:21,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3267493.3333333335, ans=0.0 2023-11-28 00:11:27,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3267560.0, ans=0.0 2023-11-28 00:11:37,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3267560.0, ans=0.125 2023-11-28 00:11:39,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3267560.0, ans=0.125 2023-11-28 00:11:44,325 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=22.5 2023-11-28 00:11:54,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3267626.6666666665, ans=0.125 2023-11-28 00:11:57,665 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490150 2023-11-28 00:12:08,892 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9200, loss[loss=0.06548, simple_loss=0.09288, pruned_loss=0.01166, audio_tagging_loss=0.007378, over 14134.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08985, pruned_loss=0.01244, audio_tagging_loss=0.008574, over 3047560.61 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:12:10,165 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.34 vs. limit=15.0 2023-11-28 00:12:11,675 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 8.944e+01 9.391e+01 1.026e+02 1.333e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 00:12:56,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3267826.6666666665, ans=0.0 2023-11-28 00:13:56,375 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490200 2023-11-28 00:14:09,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3267960.0, ans=0.125 2023-11-28 00:14:12,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3268026.6666666665, ans=0.125 2023-11-28 00:14:13,495 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9250, loss[loss=0.06905, simple_loss=0.0862, pruned_loss=0.01729, audio_tagging_loss=0.008654, over 15327.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08957, pruned_loss=0.01224, audio_tagging_loss=0.008553, over 3049614.54 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:16:09,144 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490250 2023-11-28 00:16:23,716 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9300, loss[loss=0.07103, simple_loss=0.09177, pruned_loss=0.01416, audio_tagging_loss=0.01098, over 14545.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08887, pruned_loss=0.01216, audio_tagging_loss=0.008571, over 3047959.51 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:16:27,350 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.477e+01 9.136e+01 9.623e+01 1.227e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-28 00:16:33,819 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.39 vs. limit=8.0 2023-11-28 00:17:03,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3268426.6666666665, ans=0.125 2023-11-28 00:17:39,583 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.05 vs. limit=15.0 2023-11-28 00:18:10,908 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490300 2023-11-28 00:18:23,616 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9350, loss[loss=0.06027, simple_loss=0.07486, pruned_loss=0.01068, audio_tagging_loss=0.01216, over 14357.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08854, pruned_loss=0.01205, audio_tagging_loss=0.008624, over 3036433.19 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:19:36,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3268893.3333333335, ans=0.125 2023-11-28 00:19:53,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3268960.0, ans=0.125 2023-11-28 00:19:57,187 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.96 vs. limit=10.0 2023-11-28 00:20:01,410 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490350 2023-11-28 00:20:14,373 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9400, loss[loss=0.06462, simple_loss=0.08056, pruned_loss=0.01473, audio_tagging_loss=0.009606, over 15654.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08942, pruned_loss=0.01216, audio_tagging_loss=0.008739, over 3043327.59 frames. ], batch size: 62, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:20:18,714 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.645e+01 9.230e+01 9.959e+01 1.190e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-28 00:20:47,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3269093.3333333335, ans=0.125 2023-11-28 00:20:49,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3269093.3333333335, ans=0.0 2023-11-28 00:20:59,228 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.22 vs. limit=6.0 2023-11-28 00:21:10,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3269160.0, ans=0.2 2023-11-28 00:21:12,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3269160.0, ans=0.0 2023-11-28 00:21:53,820 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490400 2023-11-28 00:22:05,799 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9450, loss[loss=0.07851, simple_loss=0.108, pruned_loss=0.01655, audio_tagging_loss=0.007973, over 15763.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08926, pruned_loss=0.01232, audio_tagging_loss=0.008901, over 3038758.66 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:22:05,962 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:22:23,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3269360.0, ans=0.125 2023-11-28 00:22:40,687 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.36 vs. limit=22.5 2023-11-28 00:23:25,271 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=12.0 2023-11-28 00:23:46,137 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490450 2023-11-28 00:23:58,244 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9500, loss[loss=0.05838, simple_loss=0.07001, pruned_loss=0.01136, audio_tagging_loss=0.01202, over 14507.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08965, pruned_loss=0.01235, audio_tagging_loss=0.008853, over 3045464.54 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:24:04,042 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 8.586e+01 9.559e+01 1.044e+02 1.238e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 00:24:31,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3269760.0, ans=0.1 2023-11-28 00:24:56,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.22 vs. limit=15.0 2023-11-28 00:25:05,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3269893.3333333335, ans=10.0 2023-11-28 00:25:25,228 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490500 2023-11-28 00:25:34,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3270026.6666666665, ans=0.07 2023-11-28 00:25:35,699 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9550, loss[loss=0.05458, simple_loss=0.07269, pruned_loss=0.007502, audio_tagging_loss=0.01073, over 14738.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08948, pruned_loss=0.01218, audio_tagging_loss=0.008895, over 3052089.85 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:25:37,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3270026.6666666665, ans=0.0 2023-11-28 00:26:06,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3270093.3333333335, ans=0.0 2023-11-28 00:26:43,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3270293.3333333335, ans=0.125 2023-11-28 00:26:49,815 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490550 2023-11-28 00:26:55,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3270293.3333333335, ans=0.0 2023-11-28 00:26:58,245 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9600, loss[loss=0.06086, simple_loss=0.0784, pruned_loss=0.01215, audio_tagging_loss=0.009498, over 15017.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08897, pruned_loss=0.01214, audio_tagging_loss=0.00893, over 3049067.22 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:27:02,613 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.793e+01 9.266e+01 1.006e+02 1.228e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-28 00:27:59,700 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490600 2023-11-28 00:28:08,100 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9650, loss[loss=0.06498, simple_loss=0.09105, pruned_loss=0.01045, audio_tagging_loss=0.009003, over 16120.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.0891, pruned_loss=0.01224, audio_tagging_loss=0.00894, over 3049077.93 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:28:29,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3270760.0, ans=0.0 2023-11-28 00:28:54,887 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.22 vs. limit=15.0 2023-11-28 00:29:05,886 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490650 2023-11-28 00:29:14,548 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9700, loss[loss=0.06287, simple_loss=0.08915, pruned_loss=0.01186, audio_tagging_loss=0.006443, over 14697.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08908, pruned_loss=0.01239, audio_tagging_loss=0.008877, over 3040964.85 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:29:18,292 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.733e+01 9.513e+01 1.030e+02 1.343e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 00:29:45,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3271160.0, ans=0.05 2023-11-28 00:29:46,819 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=22.5 2023-11-28 00:30:01,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3271226.6666666665, ans=0.0 2023-11-28 00:30:10,962 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490700 2023-11-28 00:30:15,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3271293.3333333335, ans=0.0 2023-11-28 00:30:18,807 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9750, loss[loss=0.07265, simple_loss=0.09566, pruned_loss=0.01478, audio_tagging_loss=0.01004, over 15498.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08906, pruned_loss=0.01231, audio_tagging_loss=0.008761, over 3051207.40 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:30:30,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3271426.6666666665, ans=0.0 2023-11-28 00:30:38,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3271426.6666666665, ans=0.0 2023-11-28 00:31:09,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3271626.6666666665, ans=0.125 2023-11-28 00:31:13,683 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490750 2023-11-28 00:31:20,531 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9800, loss[loss=0.06068, simple_loss=0.08968, pruned_loss=0.007718, audio_tagging_loss=0.008119, over 15051.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08977, pruned_loss=0.01254, audio_tagging_loss=0.008711, over 3048194.98 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:31:23,928 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.662e+01 9.364e+01 1.024e+02 1.595e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 00:31:27,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3271693.3333333335, ans=0.1 2023-11-28 00:31:31,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3271760.0, ans=0.1 2023-11-28 00:31:38,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3271760.0, ans=0.125 2023-11-28 00:31:46,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3271826.6666666665, ans=0.1 2023-11-28 00:31:58,792 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.08 vs. limit=15.0 2023-11-28 00:32:06,808 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.61 vs. limit=15.0 2023-11-28 00:32:07,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3271960.0, ans=0.07 2023-11-28 00:32:13,102 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490800 2023-11-28 00:32:14,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3271960.0, ans=10.0 2023-11-28 00:32:15,814 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:32:20,845 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9850, loss[loss=0.0779, simple_loss=0.1021, pruned_loss=0.0164, audio_tagging_loss=0.01046, over 14278.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08985, pruned_loss=0.01262, audio_tagging_loss=0.008641, over 3048440.76 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:32:24,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=3272026.6666666665, ans=0.02 2023-11-28 00:32:25,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3272026.6666666665, ans=0.125 2023-11-28 00:32:47,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3272160.0, ans=0.0 2023-11-28 00:32:48,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.38 vs. limit=15.0 2023-11-28 00:32:57,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3272226.6666666665, ans=0.125 2023-11-28 00:33:01,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3272226.6666666665, ans=0.125 2023-11-28 00:33:12,913 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490850 2023-11-28 00:33:13,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3272293.3333333335, ans=0.2 2023-11-28 00:33:20,822 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9900, loss[loss=0.06778, simple_loss=0.09962, pruned_loss=0.00981, audio_tagging_loss=0.008156, over 15974.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09035, pruned_loss=0.01262, audio_tagging_loss=0.008738, over 3039918.54 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:33:23,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3272360.0, ans=0.125 2023-11-28 00:33:24,131 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.601e+01 9.033e+01 9.485e+01 1.050e+02 1.243e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 00:33:38,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3272426.6666666665, ans=0.125 2023-11-28 00:33:39,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3272426.6666666665, ans=0.2 2023-11-28 00:33:43,459 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.93 vs. limit=12.0 2023-11-28 00:34:10,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3272626.6666666665, ans=0.125 2023-11-28 00:34:11,822 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490900 2023-11-28 00:34:12,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.98 vs. limit=15.0 2023-11-28 00:34:18,416 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9950, loss[loss=0.06627, simple_loss=0.09079, pruned_loss=0.01305, audio_tagging_loss=0.007827, over 15582.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08934, pruned_loss=0.01245, audio_tagging_loss=0.008714, over 3046728.61 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:34:53,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3272893.3333333335, ans=0.125 2023-11-28 00:34:54,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3272893.3333333335, ans=0.125 2023-11-28 00:34:55,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3272893.3333333335, ans=0.125 2023-11-28 00:35:06,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3272960.0, ans=0.125 2023-11-28 00:35:07,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3272960.0, ans=0.0 2023-11-28 00:35:09,085 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490950 2023-11-28 00:35:13,012 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-11-28 00:35:16,038 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10000, loss[loss=0.07005, simple_loss=0.1014, pruned_loss=0.01264, audio_tagging_loss=0.006725, over 17186.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08903, pruned_loss=0.0123, audio_tagging_loss=0.008694, over 3044100.60 frames. ], batch size: 64, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:35:19,719 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 8.605e+01 9.101e+01 9.831e+01 1.246e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-28 00:35:39,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3273160.0, ans=0.125 2023-11-28 00:35:50,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3273226.6666666665, ans=0.125 2023-11-28 00:35:52,138 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.37 vs. limit=10.0 2023-11-28 00:35:57,820 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=15.0 2023-11-28 00:36:02,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3273293.3333333335, ans=0.125 2023-11-28 00:36:05,096 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=15.0 2023-11-28 00:36:06,565 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491000 2023-11-28 00:36:13,247 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10050, loss[loss=0.05252, simple_loss=0.0659, pruned_loss=0.009638, audio_tagging_loss=0.009932, over 15467.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08926, pruned_loss=0.01232, audio_tagging_loss=0.008697, over 3042939.04 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:36:32,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3273426.6666666665, ans=0.125 2023-11-28 00:36:32,913 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=12.0 2023-11-28 00:36:36,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3273493.3333333335, ans=0.0 2023-11-28 00:36:43,385 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.42 vs. limit=12.0 2023-11-28 00:37:03,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3273626.6666666665, ans=0.125 2023-11-28 00:37:05,435 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491050 2023-11-28 00:37:05,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3273626.6666666665, ans=0.1 2023-11-28 00:37:08,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3273626.6666666665, ans=0.1 2023-11-28 00:37:11,869 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10100, loss[loss=0.07903, simple_loss=0.1022, pruned_loss=0.01778, audio_tagging_loss=0.01017, over 15053.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08892, pruned_loss=0.01226, audio_tagging_loss=0.008735, over 3046534.46 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:37:17,288 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.464e+01 8.687e+01 9.300e+01 1.008e+02 1.276e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-28 00:37:18,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3273693.3333333335, ans=0.125 2023-11-28 00:37:51,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3273893.3333333335, ans=0.125 2023-11-28 00:37:56,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3273960.0, ans=0.125 2023-11-28 00:38:01,041 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:38:02,203 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491100 2023-11-28 00:38:07,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3274026.6666666665, ans=0.1 2023-11-28 00:38:09,111 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10150, loss[loss=0.07125, simple_loss=0.1065, pruned_loss=0.01019, audio_tagging_loss=0.007791, over 15779.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08906, pruned_loss=0.01219, audio_tagging_loss=0.00871, over 3040908.21 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:38:11,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3274026.6666666665, ans=0.09899494936611666 2023-11-28 00:38:25,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3274093.3333333335, ans=0.125 2023-11-28 00:38:31,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3274160.0, ans=0.125 2023-11-28 00:38:31,276 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2023-11-28 00:38:38,443 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.02 vs. limit=15.0 2023-11-28 00:38:39,106 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:38:44,064 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.71 vs. limit=15.0 2023-11-28 00:38:49,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3274226.6666666665, ans=0.125 2023-11-28 00:38:57,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3274293.3333333335, ans=0.125 2023-11-28 00:38:59,914 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491150 2023-11-28 00:39:06,428 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10200, loss[loss=0.06828, simple_loss=0.1001, pruned_loss=0.01001, audio_tagging_loss=0.00822, over 15072.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08958, pruned_loss=0.01226, audio_tagging_loss=0.008778, over 3049357.58 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:39:12,516 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 8.857e+01 9.633e+01 1.053e+02 1.293e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 00:39:13,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3274360.0, ans=0.0 2023-11-28 00:39:22,659 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2023-11-28 00:39:26,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3274426.6666666665, ans=0.125 2023-11-28 00:39:31,042 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:39:31,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3274493.3333333335, ans=0.125 2023-11-28 00:39:38,784 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.61 vs. limit=15.0 2023-11-28 00:39:52,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3274626.6666666665, ans=0.2 2023-11-28 00:39:57,862 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491200 2023-11-28 00:40:05,365 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10250, loss[loss=0.05843, simple_loss=0.06839, pruned_loss=0.01115, audio_tagging_loss=0.01308, over 15159.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08943, pruned_loss=0.01236, audio_tagging_loss=0.008845, over 3051912.03 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:40:12,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=3274693.3333333335, ans=10.0 2023-11-28 00:40:19,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3274760.0, ans=0.0 2023-11-28 00:40:28,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3274826.6666666665, ans=0.125 2023-11-28 00:40:37,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3274826.6666666665, ans=0.0 2023-11-28 00:40:42,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3274893.3333333335, ans=0.125 2023-11-28 00:40:55,957 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491250 2023-11-28 00:40:58,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3274960.0, ans=0.125 2023-11-28 00:41:00,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3274960.0, ans=0.09899494936611666 2023-11-28 00:41:02,330 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10300, loss[loss=0.07111, simple_loss=0.1042, pruned_loss=0.01058, audio_tagging_loss=0.008458, over 16163.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08953, pruned_loss=0.01246, audio_tagging_loss=0.008907, over 3055797.79 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:41:08,325 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.570e+01 8.818e+01 9.627e+01 1.031e+02 1.268e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 00:41:22,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3275093.3333333335, ans=0.125 2023-11-28 00:41:22,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3275093.3333333335, ans=0.2 2023-11-28 00:41:30,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3275160.0, ans=0.125 2023-11-28 00:41:31,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3275160.0, ans=0.1 2023-11-28 00:41:53,417 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491300 2023-11-28 00:41:55,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3275293.3333333335, ans=0.2 2023-11-28 00:41:59,906 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10350, loss[loss=0.08497, simple_loss=0.1125, pruned_loss=0.0191, audio_tagging_loss=0.009623, over 14751.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08966, pruned_loss=0.01255, audio_tagging_loss=0.008942, over 3048145.83 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:42:14,068 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2023-11-28 00:42:23,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.54 vs. limit=15.0 2023-11-28 00:42:31,638 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.86 vs. limit=15.0 2023-11-28 00:42:32,532 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.01 vs. limit=15.0 2023-11-28 00:42:34,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=15.0 2023-11-28 00:42:38,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3275560.0, ans=0.2 2023-11-28 00:42:40,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.20 vs. limit=15.0 2023-11-28 00:42:50,294 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491350 2023-11-28 00:42:56,802 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10400, loss[loss=0.06481, simple_loss=0.0905, pruned_loss=0.01116, audio_tagging_loss=0.008403, over 15737.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.08961, pruned_loss=0.01252, audio_tagging_loss=0.009086, over 3053216.47 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:42:58,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3275693.3333333335, ans=0.0 2023-11-28 00:43:02,213 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.297e+01 8.640e+01 9.257e+01 1.001e+02 1.271e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-28 00:43:13,502 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.42 vs. limit=5.0 2023-11-28 00:43:17,092 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2023-11-28 00:43:17,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3275760.0, ans=0.125 2023-11-28 00:43:18,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3275826.6666666665, ans=0.0 2023-11-28 00:43:20,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3275826.6666666665, ans=0.125 2023-11-28 00:43:21,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=3275826.6666666665, ans=0.1 2023-11-28 00:43:27,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3275826.6666666665, ans=0.125 2023-11-28 00:43:33,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3275893.3333333335, ans=0.0 2023-11-28 00:43:36,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3275893.3333333335, ans=0.035 2023-11-28 00:43:37,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3275893.3333333335, ans=0.0 2023-11-28 00:43:46,941 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491400 2023-11-28 00:43:54,180 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10450, loss[loss=0.05555, simple_loss=0.07433, pruned_loss=0.009423, audio_tagging_loss=0.008966, over 15648.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.0897, pruned_loss=0.01246, audio_tagging_loss=0.009068, over 3043644.72 frames. ], batch size: 62, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:44:01,984 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2023-11-28 00:44:08,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3276093.3333333335, ans=0.0 2023-11-28 00:44:19,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3276160.0, ans=0.125 2023-11-28 00:44:20,024 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:44:26,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3276160.0, ans=0.0 2023-11-28 00:44:40,093 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2023-11-28 00:44:41,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3276293.3333333335, ans=0.0 2023-11-28 00:44:44,534 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491450 2023-11-28 00:44:51,524 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10500, loss[loss=0.07065, simple_loss=0.09812, pruned_loss=0.01359, audio_tagging_loss=0.007996, over 14475.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.08998, pruned_loss=0.01257, audio_tagging_loss=0.008942, over 3042507.72 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:44:55,702 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.56 vs. limit=10.0 2023-11-28 00:44:57,002 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.171e+01 8.695e+01 9.363e+01 1.021e+02 1.243e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 00:45:03,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3276426.6666666665, ans=0.09899494936611666 2023-11-28 00:45:23,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3276493.3333333335, ans=0.125 2023-11-28 00:45:37,499 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.34 vs. limit=22.5 2023-11-28 00:45:41,427 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491500 2023-11-28 00:45:48,542 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10550, loss[loss=0.07649, simple_loss=0.1125, pruned_loss=0.01295, audio_tagging_loss=0.007282, over 15818.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08975, pruned_loss=0.01242, audio_tagging_loss=0.008843, over 3052272.69 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:45:50,286 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=12.0 2023-11-28 00:46:01,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.77 vs. limit=15.0 2023-11-28 00:46:05,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3276760.0, ans=0.0 2023-11-28 00:46:13,217 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.90 vs. limit=10.0 2023-11-28 00:46:18,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3276826.6666666665, ans=0.1 2023-11-28 00:46:32,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3276893.3333333335, ans=0.0 2023-11-28 00:46:39,189 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491550 2023-11-28 00:46:40,673 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.18 vs. limit=22.5 2023-11-28 00:46:45,611 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10600, loss[loss=0.03844, simple_loss=0.0459, pruned_loss=0.005925, audio_tagging_loss=0.009569, over 14314.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09017, pruned_loss=0.01249, audio_tagging_loss=0.008707, over 3050031.99 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:46:51,908 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.296e+01 8.682e+01 9.138e+01 9.881e+01 1.216e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-28 00:47:26,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3277226.6666666665, ans=0.125 2023-11-28 00:47:36,670 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491600 2023-11-28 00:47:38,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3277293.3333333335, ans=0.0 2023-11-28 00:47:40,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3277293.3333333335, ans=0.125 2023-11-28 00:47:44,082 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10650, loss[loss=0.05998, simple_loss=0.08423, pruned_loss=0.01107, audio_tagging_loss=0.00679, over 14664.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08978, pruned_loss=0.01243, audio_tagging_loss=0.008692, over 3046942.74 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:47:44,798 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.71 vs. limit=15.0 2023-11-28 00:47:46,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3277360.0, ans=0.125 2023-11-28 00:47:51,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3277360.0, ans=0.125 2023-11-28 00:47:51,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3277360.0, ans=0.1 2023-11-28 00:47:56,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3277426.6666666665, ans=0.125 2023-11-28 00:48:09,171 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.45 vs. limit=15.0 2023-11-28 00:48:22,386 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.99 vs. limit=15.0 2023-11-28 00:48:26,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3277560.0, ans=0.0 2023-11-28 00:48:30,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3277626.6666666665, ans=0.0 2023-11-28 00:48:34,212 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491650 2023-11-28 00:48:36,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3277626.6666666665, ans=0.125 2023-11-28 00:48:38,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3277626.6666666665, ans=0.0 2023-11-28 00:48:39,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3277626.6666666665, ans=0.125 2023-11-28 00:48:41,247 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10700, loss[loss=0.07716, simple_loss=0.1128, pruned_loss=0.009311, audio_tagging_loss=0.01145, over 16341.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08924, pruned_loss=0.0123, audio_tagging_loss=0.008724, over 3040835.28 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:48:41,591 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:48:46,553 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.912e+01 8.856e+01 9.300e+01 9.841e+01 1.574e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-28 00:48:51,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3277760.0, ans=0.125 2023-11-28 00:49:03,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3277826.6666666665, ans=0.125 2023-11-28 00:49:05,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3277826.6666666665, ans=0.0 2023-11-28 00:49:14,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3277893.3333333335, ans=0.0 2023-11-28 00:49:17,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3277893.3333333335, ans=0.125 2023-11-28 00:49:27,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3277960.0, ans=0.1 2023-11-28 00:49:30,812 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491700 2023-11-28 00:49:33,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3277960.0, ans=0.125 2023-11-28 00:49:37,223 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10750, loss[loss=0.05123, simple_loss=0.06075, pruned_loss=0.01145, audio_tagging_loss=0.009407, over 13896.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.0896, pruned_loss=0.01241, audio_tagging_loss=0.008648, over 3046084.65 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:49:41,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3278026.6666666665, ans=0.125 2023-11-28 00:50:01,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=3278160.0, ans=15.0 2023-11-28 00:50:07,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3278160.0, ans=0.05 2023-11-28 00:50:08,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3278160.0, ans=0.125 2023-11-28 00:50:25,206 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2023-11-28 00:50:25,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3278293.3333333335, ans=0.09899494936611666 2023-11-28 00:50:27,872 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491750 2023-11-28 00:50:35,824 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10800, loss[loss=0.0755, simple_loss=0.1043, pruned_loss=0.01122, audio_tagging_loss=0.01214, over 15613.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08993, pruned_loss=0.01239, audio_tagging_loss=0.008651, over 3051506.52 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:50:41,283 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.816e+01 8.647e+01 9.300e+01 1.005e+02 1.391e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-28 00:50:58,815 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.74 vs. limit=22.5 2023-11-28 00:51:02,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3278493.3333333335, ans=0.1 2023-11-28 00:51:03,149 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.14 vs. limit=22.5 2023-11-28 00:51:19,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3278560.0, ans=0.05 2023-11-28 00:51:21,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3278626.6666666665, ans=0.07 2023-11-28 00:51:26,180 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491800 2023-11-28 00:51:33,556 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10850, loss[loss=0.05743, simple_loss=0.08054, pruned_loss=0.009518, audio_tagging_loss=0.00764, over 15782.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09052, pruned_loss=0.01236, audio_tagging_loss=0.008662, over 3049264.69 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:51:42,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3278693.3333333335, ans=0.125 2023-11-28 00:51:42,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.48 vs. limit=22.5 2023-11-28 00:51:43,913 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.67 vs. limit=15.0 2023-11-28 00:51:45,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3278760.0, ans=0.0 2023-11-28 00:52:05,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3278826.6666666665, ans=0.0 2023-11-28 00:52:16,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3278893.3333333335, ans=0.0 2023-11-28 00:52:20,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3278960.0, ans=0.125 2023-11-28 00:52:22,984 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491850 2023-11-28 00:52:28,270 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:52:28,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3279026.6666666665, ans=0.0 2023-11-28 00:52:29,391 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10900, loss[loss=0.06889, simple_loss=0.09227, pruned_loss=0.01366, audio_tagging_loss=0.009095, over 15153.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.0896, pruned_loss=0.01237, audio_tagging_loss=0.008772, over 3052050.68 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:52:34,720 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.617e+01 8.904e+01 9.696e+01 1.053e+02 1.235e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 00:52:36,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3279026.6666666665, ans=0.5 2023-11-28 00:52:50,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3279093.3333333335, ans=0.0 2023-11-28 00:53:03,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3279226.6666666665, ans=0.125 2023-11-28 00:53:15,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3279293.3333333335, ans=0.0 2023-11-28 00:53:19,357 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491900 2023-11-28 00:53:22,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3279293.3333333335, ans=0.015 2023-11-28 00:53:26,251 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10950, loss[loss=0.0588, simple_loss=0.07123, pruned_loss=0.0111, audio_tagging_loss=0.01209, over 14592.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09004, pruned_loss=0.01233, audio_tagging_loss=0.008743, over 3054969.78 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:53:34,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3279360.0, ans=0.125 2023-11-28 00:53:36,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3279360.0, ans=0.125 2023-11-28 00:53:36,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3279360.0, ans=0.125 2023-11-28 00:53:53,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3279493.3333333335, ans=0.2 2023-11-28 00:54:02,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3279560.0, ans=0.125 2023-11-28 00:54:04,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.11 vs. limit=10.0 2023-11-28 00:54:05,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3279560.0, ans=0.1 2023-11-28 00:54:08,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3279560.0, ans=0.125 2023-11-28 00:54:11,889 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:54:16,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3279626.6666666665, ans=0.04949747468305833 2023-11-28 00:54:17,739 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491950 2023-11-28 00:54:24,174 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11000, loss[loss=0.05269, simple_loss=0.06874, pruned_loss=0.008318, audio_tagging_loss=0.01, over 14465.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08961, pruned_loss=0.01233, audio_tagging_loss=0.008777, over 3047716.18 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:54:30,057 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.813e+01 8.785e+01 9.323e+01 1.002e+02 1.243e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 00:54:35,506 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:54:45,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3279826.6666666665, ans=0.0 2023-11-28 00:54:49,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3279826.6666666665, ans=0.125 2023-11-28 00:54:50,286 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=12.0 2023-11-28 00:54:54,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3279826.6666666665, ans=0.0 2023-11-28 00:54:57,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3279893.3333333335, ans=0.1 2023-11-28 00:55:11,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3279960.0, ans=0.125 2023-11-28 00:55:14,300 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492000 2023-11-28 00:55:15,608 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-492000.pt 2023-11-28 00:55:22,891 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11050, loss[loss=0.07367, simple_loss=0.09632, pruned_loss=0.01378, audio_tagging_loss=0.01174, over 15742.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08957, pruned_loss=0.01239, audio_tagging_loss=0.008866, over 3044866.14 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:56:01,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3280226.6666666665, ans=0.0 2023-11-28 00:56:03,502 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.84 vs. limit=15.0 2023-11-28 00:56:12,655 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492050 2023-11-28 00:56:17,439 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2023-11-28 00:56:19,117 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11100, loss[loss=0.07116, simple_loss=0.09566, pruned_loss=0.01539, audio_tagging_loss=0.007943, over 15047.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08979, pruned_loss=0.01236, audio_tagging_loss=0.008961, over 3049226.81 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:56:21,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3280360.0, ans=0.125 2023-11-28 00:56:26,427 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.769e+01 9.313e+01 9.922e+01 1.261e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-28 00:56:44,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.05 vs. limit=15.0 2023-11-28 00:56:46,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3280493.3333333335, ans=0.2 2023-11-28 00:56:47,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3280493.3333333335, ans=0.125 2023-11-28 00:57:09,874 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492100 2023-11-28 00:57:15,380 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.89 vs. limit=15.0 2023-11-28 00:57:16,939 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11150, loss[loss=0.0755, simple_loss=0.1125, pruned_loss=0.01293, audio_tagging_loss=0.006294, over 15547.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.08984, pruned_loss=0.01234, audio_tagging_loss=0.009037, over 3052346.04 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:57:22,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3280693.3333333335, ans=0.125 2023-11-28 00:57:37,552 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.16 vs. limit=15.0 2023-11-28 00:57:48,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3280826.6666666665, ans=0.1 2023-11-28 00:57:48,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3280826.6666666665, ans=0.125 2023-11-28 00:57:57,065 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.28 vs. limit=22.5 2023-11-28 00:58:00,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3280893.3333333335, ans=0.125 2023-11-28 00:58:04,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3280960.0, ans=0.125 2023-11-28 00:58:07,341 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492150 2023-11-28 00:58:13,847 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11200, loss[loss=0.06514, simple_loss=0.08902, pruned_loss=0.01244, audio_tagging_loss=0.008181, over 16945.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08959, pruned_loss=0.0123, audio_tagging_loss=0.009075, over 3054266.86 frames. ], batch size: 63, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:58:17,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3281026.6666666665, ans=0.0 2023-11-28 00:58:20,456 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 8.951e+01 9.684e+01 1.065e+02 1.269e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-28 00:58:30,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3281093.3333333335, ans=0.1 2023-11-28 00:58:31,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3281093.3333333335, ans=0.1 2023-11-28 00:58:39,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3281160.0, ans=0.1 2023-11-28 00:58:42,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3281160.0, ans=0.125 2023-11-28 00:58:44,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3281160.0, ans=0.125 2023-11-28 00:58:50,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3281226.6666666665, ans=0.125 2023-11-28 00:58:52,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3281226.6666666665, ans=0.125 2023-11-28 00:59:04,334 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492200 2023-11-28 00:59:11,118 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11250, loss[loss=0.05652, simple_loss=0.08097, pruned_loss=0.007933, audio_tagging_loss=0.0081, over 16393.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08912, pruned_loss=0.0123, audio_tagging_loss=0.009089, over 3051675.01 frames. ], batch size: 62, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:59:17,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3281360.0, ans=0.0 2023-11-28 00:59:26,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3281426.6666666665, ans=0.1 2023-11-28 00:59:45,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3281560.0, ans=0.125 2023-11-28 00:59:57,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3281626.6666666665, ans=0.125 2023-11-28 01:00:01,800 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492250 2023-11-28 01:00:08,267 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11300, loss[loss=0.0771, simple_loss=0.09533, pruned_loss=0.01597, audio_tagging_loss=0.01346, over 15251.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08985, pruned_loss=0.01242, audio_tagging_loss=0.008868, over 3051185.00 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:00:11,054 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.72 vs. limit=15.0 2023-11-28 01:00:13,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3281693.3333333335, ans=0.125 2023-11-28 01:00:14,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.90 vs. limit=22.5 2023-11-28 01:00:17,051 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.226e+01 8.951e+01 9.399e+01 1.008e+02 1.489e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-28 01:00:20,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3281760.0, ans=0.1 2023-11-28 01:00:27,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3281760.0, ans=0.125 2023-11-28 01:00:30,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3281826.6666666665, ans=0.0 2023-11-28 01:00:53,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3281960.0, ans=0.125 2023-11-28 01:00:57,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3281960.0, ans=0.125 2023-11-28 01:00:59,896 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492300 2023-11-28 01:01:04,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3281960.0, ans=0.2 2023-11-28 01:01:06,442 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11350, loss[loss=0.04911, simple_loss=0.06536, pruned_loss=0.005764, audio_tagging_loss=0.01066, over 14479.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09015, pruned_loss=0.01258, audio_tagging_loss=0.008808, over 3049476.31 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:01:06,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3282026.6666666665, ans=0.07 2023-11-28 01:01:07,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3282026.6666666665, ans=0.1 2023-11-28 01:01:30,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3282160.0, ans=0.125 2023-11-28 01:01:49,121 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:01:56,598 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492350 2023-11-28 01:02:02,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3282360.0, ans=0.125 2023-11-28 01:02:02,958 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11400, loss[loss=0.09147, simple_loss=0.1162, pruned_loss=0.02384, audio_tagging_loss=0.009524, over 15802.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09086, pruned_loss=0.01277, audio_tagging_loss=0.008761, over 3047728.20 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:02:10,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3282360.0, ans=0.0 2023-11-28 01:02:11,603 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.540e+01 8.755e+01 9.421e+01 1.020e+02 1.331e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 01:02:19,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3282426.6666666665, ans=0.1 2023-11-28 01:02:21,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3282426.6666666665, ans=0.2 2023-11-28 01:02:31,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3282493.3333333335, ans=0.04949747468305833 2023-11-28 01:02:34,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=15.0 2023-11-28 01:02:49,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3282626.6666666665, ans=0.2 2023-11-28 01:02:53,675 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492400 2023-11-28 01:02:54,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=22.5 2023-11-28 01:03:00,927 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11450, loss[loss=0.07524, simple_loss=0.1085, pruned_loss=0.01555, audio_tagging_loss=0.005464, over 14687.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09139, pruned_loss=0.0127, audio_tagging_loss=0.00866, over 3046819.97 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:03:31,246 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.19 vs. limit=22.5 2023-11-28 01:03:47,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3282960.0, ans=0.125 2023-11-28 01:03:50,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3282960.0, ans=0.1 2023-11-28 01:03:52,019 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492450 2023-11-28 01:03:58,521 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11500, loss[loss=0.05703, simple_loss=0.07769, pruned_loss=0.007028, audio_tagging_loss=0.01116, over 15498.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09068, pruned_loss=0.01269, audio_tagging_loss=0.008732, over 3042554.54 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:04:00,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3283026.6666666665, ans=0.2 2023-11-28 01:04:01,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3283026.6666666665, ans=0.2 2023-11-28 01:04:06,710 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.678e+01 9.475e+01 1.027e+02 1.615e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 01:04:06,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3283026.6666666665, ans=0.0 2023-11-28 01:04:20,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3283160.0, ans=0.0 2023-11-28 01:04:30,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3283160.0, ans=0.0 2023-11-28 01:04:47,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3283293.3333333335, ans=0.1 2023-11-28 01:04:49,245 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492500 2023-11-28 01:04:55,708 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11550, loss[loss=0.04967, simple_loss=0.06231, pruned_loss=0.008106, audio_tagging_loss=0.01041, over 14642.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09107, pruned_loss=0.01271, audio_tagging_loss=0.008668, over 3045120.57 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:05:12,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3283426.6666666665, ans=0.0 2023-11-28 01:05:12,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3283426.6666666665, ans=0.0 2023-11-28 01:05:14,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3283426.6666666665, ans=0.125 2023-11-28 01:05:32,981 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 01:05:44,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3283626.6666666665, ans=0.125 2023-11-28 01:05:46,489 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492550 2023-11-28 01:05:48,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3283626.6666666665, ans=0.0 2023-11-28 01:05:53,387 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11600, loss[loss=0.05516, simple_loss=0.07554, pruned_loss=0.007955, audio_tagging_loss=0.009439, over 15772.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09205, pruned_loss=0.01275, audio_tagging_loss=0.008609, over 3047539.78 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:06:02,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3283693.3333333335, ans=0.125 2023-11-28 01:06:03,696 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 9.077e+01 9.650e+01 1.023e+02 1.320e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 01:06:23,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3283826.6666666665, ans=0.0 2023-11-28 01:06:32,846 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.86 vs. limit=15.0 2023-11-28 01:06:37,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3283893.3333333335, ans=0.2 2023-11-28 01:06:43,878 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492600 2023-11-28 01:06:51,011 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11650, loss[loss=0.07005, simple_loss=0.08697, pruned_loss=0.0147, audio_tagging_loss=0.01187, over 15134.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09206, pruned_loss=0.01295, audio_tagging_loss=0.008636, over 3043237.73 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:07:02,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3284093.3333333335, ans=0.1 2023-11-28 01:07:22,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3284160.0, ans=0.0 2023-11-28 01:07:41,364 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492650 2023-11-28 01:07:41,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3284293.3333333335, ans=0.0 2023-11-28 01:07:48,464 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11700, loss[loss=0.05857, simple_loss=0.08044, pruned_loss=0.008026, audio_tagging_loss=0.01033, over 14007.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09073, pruned_loss=0.0127, audio_tagging_loss=0.008724, over 3044266.08 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:07:58,792 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.595e+01 8.938e+01 9.502e+01 1.017e+02 1.872e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 01:07:58,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3284426.6666666665, ans=0.2 2023-11-28 01:08:00,285 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:08:02,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3284426.6666666665, ans=0.125 2023-11-28 01:08:08,755 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.88 vs. limit=10.0 2023-11-28 01:08:18,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3284493.3333333335, ans=0.0 2023-11-28 01:08:39,184 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492700 2023-11-28 01:08:46,022 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11750, loss[loss=0.04844, simple_loss=0.05942, pruned_loss=0.008178, audio_tagging_loss=0.01055, over 14156.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09091, pruned_loss=0.01282, audio_tagging_loss=0.008702, over 3047409.29 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:09:36,247 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492750 2023-11-28 01:09:43,368 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11800, loss[loss=0.04996, simple_loss=0.0598, pruned_loss=0.009168, audio_tagging_loss=0.0109, over 15144.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09089, pruned_loss=0.01283, audio_tagging_loss=0.008765, over 3043288.14 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:09:53,164 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 8.795e+01 9.542e+01 1.022e+02 1.386e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 01:10:02,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2023-11-28 01:10:06,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3285160.0, ans=0.125 2023-11-28 01:10:12,500 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=15.0 2023-11-28 01:10:15,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3285160.0, ans=0.2 2023-11-28 01:10:21,752 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.48 vs. limit=12.0 2023-11-28 01:10:22,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3285226.6666666665, ans=0.125 2023-11-28 01:10:33,733 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492800 2023-11-28 01:10:36,672 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.85 vs. limit=22.5 2023-11-28 01:10:40,539 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11850, loss[loss=0.06558, simple_loss=0.09461, pruned_loss=0.01078, audio_tagging_loss=0.007488, over 15076.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09064, pruned_loss=0.01282, audio_tagging_loss=0.008857, over 3048915.27 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:10:43,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3285360.0, ans=0.2 2023-11-28 01:11:24,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3285560.0, ans=0.5 2023-11-28 01:11:25,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3285626.6666666665, ans=0.0 2023-11-28 01:11:27,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3285626.6666666665, ans=0.0 2023-11-28 01:11:31,349 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492850 2023-11-28 01:11:38,311 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11900, loss[loss=0.05756, simple_loss=0.0746, pruned_loss=0.009926, audio_tagging_loss=0.01034, over 15192.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09063, pruned_loss=0.01278, audio_tagging_loss=0.008858, over 3042791.10 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:11:46,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3285693.3333333335, ans=10.0 2023-11-28 01:11:48,648 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.534e+01 8.587e+01 9.286e+01 9.981e+01 1.301e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 01:11:54,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3285760.0, ans=0.0 2023-11-28 01:11:57,448 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-11-28 01:12:06,189 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.83 vs. limit=10.0 2023-11-28 01:12:08,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3285826.6666666665, ans=0.2 2023-11-28 01:12:08,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3285826.6666666665, ans=0.125 2023-11-28 01:12:15,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3285893.3333333335, ans=0.125 2023-11-28 01:12:20,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3285893.3333333335, ans=0.2 2023-11-28 01:12:25,129 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.61 vs. limit=15.0 2023-11-28 01:12:29,097 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492900 2023-11-28 01:12:36,108 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11950, loss[loss=0.06428, simple_loss=0.09161, pruned_loss=0.01223, audio_tagging_loss=0.006241, over 14363.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09018, pruned_loss=0.01271, audio_tagging_loss=0.008964, over 3035148.58 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:12:46,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3286093.3333333335, ans=0.1 2023-11-28 01:12:53,150 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.19 vs. limit=22.5 2023-11-28 01:12:53,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3286093.3333333335, ans=0.2 2023-11-28 01:12:57,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3286160.0, ans=0.125 2023-11-28 01:13:24,820 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492950 2023-11-28 01:13:30,985 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 12000, loss[loss=0.09731, simple_loss=0.141, pruned_loss=0.0219, audio_tagging_loss=0.004937, over 15888.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09006, pruned_loss=0.01271, audio_tagging_loss=0.008972, over 3036009.66 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:13:30,987 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 01:14:03,696 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1244, 2.4064, 4.9977, 2.9495], device='cuda:0') 2023-11-28 01:14:05,678 INFO [train_asr.py:1267] (0/4) Epoch 41, validation: loss=0.05796, simple_loss=0.05063, pruned_loss=0.005209, audio_tagging_loss=0.02743, over 4681554.00 frames. 2023-11-28 01:14:05,678 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 01:14:07,149 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.65 vs. limit=15.0 2023-11-28 01:14:08,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3286360.0, ans=0.0 2023-11-28 01:14:15,051 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.802e+01 9.296e+01 1.010e+02 1.466e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-28 01:14:19,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=3286426.6666666665, ans=0.1 2023-11-28 01:14:21,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3286426.6666666665, ans=0.2 2023-11-28 01:14:31,442 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-41.pt 2023-11-28 01:14:48,437 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 0, loss[loss=0.07342, simple_loss=0.08773, pruned_loss=0.01254, audio_tagging_loss=0.01701, over 15611.00 frames. ], tot_loss[loss=0.07342, simple_loss=0.08773, pruned_loss=0.01254, audio_tagging_loss=0.01701, over 15611.00 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:14:48,440 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 01:15:03,420 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.5742, 2.5957, 3.4261, 2.8006], device='cuda:0') 2023-11-28 01:15:22,245 INFO [train_asr.py:1267] (0/4) Epoch 42, validation: loss=0.05771, simple_loss=0.05063, pruned_loss=0.005208, audio_tagging_loss=0.02719, over 4681554.00 frames. 2023-11-28 01:15:22,246 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 01:15:23,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3286513.3333333335, ans=0.125 2023-11-28 01:15:24,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3286513.3333333335, ans=10.0 2023-11-28 01:15:31,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3286513.3333333335, ans=0.125 2023-11-28 01:15:40,464 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2023-11-28 01:15:45,210 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.82 vs. limit=15.0 2023-11-28 01:15:45,702 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493000 2023-11-28 01:16:14,540 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.01 vs. limit=15.0 2023-11-28 01:16:17,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3286780.0, ans=0.125 2023-11-28 01:16:17,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3286780.0, ans=0.2 2023-11-28 01:16:19,456 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 50, loss[loss=0.08275, simple_loss=0.109, pruned_loss=0.01534, audio_tagging_loss=0.01292, over 16420.00 frames. ], tot_loss[loss=0.07583, simple_loss=0.09315, pruned_loss=0.01277, audio_tagging_loss=0.01649, over 696830.14 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:16:25,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.08 vs. limit=15.0 2023-11-28 01:16:43,996 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493050 2023-11-28 01:17:01,282 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.729e+01 9.634e+01 1.031e+02 1.127e+02 1.457e+02, threshold=2.062e+02, percent-clipped=0.0 2023-11-28 01:17:16,855 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 100, loss[loss=0.04579, simple_loss=0.04557, pruned_loss=0.005817, audio_tagging_loss=0.01719, over 14593.00 frames. ], tot_loss[loss=0.07493, simple_loss=0.09265, pruned_loss=0.01287, audio_tagging_loss=0.01573, over 1213454.56 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:17:42,085 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493100 2023-11-28 01:17:47,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3287313.3333333335, ans=0.2 2023-11-28 01:17:58,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3287380.0, ans=0.125 2023-11-28 01:17:59,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3287380.0, ans=0.125 2023-11-28 01:18:15,154 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 150, loss[loss=0.06263, simple_loss=0.0745, pruned_loss=0.01522, audio_tagging_loss=0.01016, over 15367.00 frames. ], tot_loss[loss=0.07317, simple_loss=0.09252, pruned_loss=0.01271, audio_tagging_loss=0.0142, over 1621747.64 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:18:17,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3287513.3333333335, ans=0.0 2023-11-28 01:18:19,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3287513.3333333335, ans=0.125 2023-11-28 01:18:26,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3287580.0, ans=0.1 2023-11-28 01:18:38,979 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493150 2023-11-28 01:18:39,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3287646.6666666665, ans=0.0 2023-11-28 01:18:56,584 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=15.0 2023-11-28 01:18:57,029 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.207e+01 8.959e+01 9.616e+01 1.058e+02 1.322e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-28 01:19:10,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3287780.0, ans=0.0 2023-11-28 01:19:12,894 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 200, loss[loss=0.05044, simple_loss=0.06615, pruned_loss=0.009231, audio_tagging_loss=0.008137, over 14945.00 frames. ], tot_loss[loss=0.07188, simple_loss=0.09279, pruned_loss=0.01291, audio_tagging_loss=0.01258, over 1936377.80 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:19:23,544 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.31 vs. limit=15.0 2023-11-28 01:19:26,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3287913.3333333335, ans=0.0 2023-11-28 01:19:37,398 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493200 2023-11-28 01:19:45,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3287980.0, ans=0.0 2023-11-28 01:20:11,198 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 250, loss[loss=0.05585, simple_loss=0.06427, pruned_loss=0.01053, audio_tagging_loss=0.01318, over 13765.00 frames. ], tot_loss[loss=0.07009, simple_loss=0.09165, pruned_loss=0.01279, audio_tagging_loss=0.01148, over 2182897.30 frames. ], batch size: 53, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:20:13,095 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=12.0 2023-11-28 01:20:13,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3288180.0, ans=0.0 2023-11-28 01:20:19,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3288180.0, ans=0.0 2023-11-28 01:20:24,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3288246.6666666665, ans=0.1 2023-11-28 01:20:33,264 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.08 vs. limit=15.0 2023-11-28 01:20:36,501 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493250 2023-11-28 01:20:43,959 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.38 vs. limit=15.0 2023-11-28 01:20:50,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3288380.0, ans=0.125 2023-11-28 01:20:52,864 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 9.203e+01 9.691e+01 1.057e+02 1.267e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-28 01:20:53,384 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.52 vs. limit=22.5 2023-11-28 01:21:04,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3288446.6666666665, ans=0.125 2023-11-28 01:21:09,029 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 300, loss[loss=0.04248, simple_loss=0.05639, pruned_loss=0.006127, audio_tagging_loss=0.008156, over 15007.00 frames. ], tot_loss[loss=0.06996, simple_loss=0.09272, pruned_loss=0.01309, audio_tagging_loss=0.01051, over 2387353.92 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:21:33,301 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493300 2023-11-28 01:21:41,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.18 vs. limit=15.0 2023-11-28 01:21:44,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3288713.3333333335, ans=0.1 2023-11-28 01:22:05,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3288846.6666666665, ans=0.035 2023-11-28 01:22:06,953 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 350, loss[loss=0.06407, simple_loss=0.08358, pruned_loss=0.01131, audio_tagging_loss=0.01096, over 14841.00 frames. ], tot_loss[loss=0.06888, simple_loss=0.09217, pruned_loss=0.01274, audio_tagging_loss=0.01006, over 2538892.90 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:22:30,714 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493350 2023-11-28 01:22:41,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3289046.6666666665, ans=0.1 2023-11-28 01:22:49,682 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.671e+01 9.361e+01 1.014e+02 1.227e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 01:22:53,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3289113.3333333335, ans=0.125 2023-11-28 01:22:58,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3289113.3333333335, ans=15.0 2023-11-28 01:23:03,885 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 400, loss[loss=0.07517, simple_loss=0.1039, pruned_loss=0.01277, audio_tagging_loss=0.01046, over 16629.00 frames. ], tot_loss[loss=0.06804, simple_loss=0.09144, pruned_loss=0.01256, audio_tagging_loss=0.009759, over 2647970.76 frames. ], batch size: 61, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:23:07,889 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.81 vs. limit=10.0 2023-11-28 01:23:27,965 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493400 2023-11-28 01:23:30,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3289313.3333333335, ans=0.0 2023-11-28 01:23:38,392 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-28 01:23:44,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3289380.0, ans=0.125 2023-11-28 01:23:45,274 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-11-28 01:23:50,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3289446.6666666665, ans=0.2 2023-11-28 01:23:52,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3289446.6666666665, ans=0.125 2023-11-28 01:23:58,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3289446.6666666665, ans=0.125 2023-11-28 01:24:02,000 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 450, loss[loss=0.04563, simple_loss=0.05247, pruned_loss=0.007234, audio_tagging_loss=0.01216, over 14711.00 frames. ], tot_loss[loss=0.06776, simple_loss=0.09142, pruned_loss=0.01261, audio_tagging_loss=0.009436, over 2735290.55 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:24:26,417 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493450 2023-11-28 01:24:33,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3289646.6666666665, ans=0.125 2023-11-28 01:24:41,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3289713.3333333335, ans=0.2 2023-11-28 01:24:45,838 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.828e+01 9.254e+01 1.009e+02 1.850e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-28 01:24:59,873 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 500, loss[loss=0.08544, simple_loss=0.12, pruned_loss=0.01858, audio_tagging_loss=0.006855, over 16343.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09065, pruned_loss=0.0125, audio_tagging_loss=0.009218, over 2800646.35 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:25:11,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3289913.3333333335, ans=0.1 2023-11-28 01:25:13,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3289913.3333333335, ans=0.125 2023-11-28 01:25:17,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3289913.3333333335, ans=0.125 2023-11-28 01:25:23,588 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493500 2023-11-28 01:25:54,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3290113.3333333335, ans=0.125 2023-11-28 01:25:57,459 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 550, loss[loss=0.05797, simple_loss=0.08219, pruned_loss=0.009178, audio_tagging_loss=0.007698, over 15464.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09086, pruned_loss=0.01264, audio_tagging_loss=0.009027, over 2857416.45 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:26:16,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3290246.6666666665, ans=0.1 2023-11-28 01:26:21,509 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493550 2023-11-28 01:26:37,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3290380.0, ans=0.125 2023-11-28 01:26:41,289 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 8.718e+01 9.476e+01 1.036e+02 1.288e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 01:26:55,498 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 600, loss[loss=0.08411, simple_loss=0.1173, pruned_loss=0.01936, audio_tagging_loss=0.006078, over 15153.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09148, pruned_loss=0.01276, audio_tagging_loss=0.008972, over 2900830.57 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:27:03,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3290513.3333333335, ans=0.125 2023-11-28 01:27:03,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3290513.3333333335, ans=0.2 2023-11-28 01:27:04,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3290513.3333333335, ans=0.125 2023-11-28 01:27:20,233 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493600 2023-11-28 01:27:23,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3290646.6666666665, ans=0.1 2023-11-28 01:27:37,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3290713.3333333335, ans=0.1 2023-11-28 01:27:50,410 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:27:53,959 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 650, loss[loss=0.06534, simple_loss=0.07913, pruned_loss=0.01441, audio_tagging_loss=0.01136, over 15208.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09022, pruned_loss=0.01248, audio_tagging_loss=0.009064, over 2928627.50 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:28:03,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3290846.6666666665, ans=0.125 2023-11-28 01:28:07,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3290913.3333333335, ans=0.0 2023-11-28 01:28:14,142 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2023-11-28 01:28:17,842 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493650 2023-11-28 01:28:24,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3290980.0, ans=0.0 2023-11-28 01:28:27,969 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.92 vs. limit=10.0 2023-11-28 01:28:30,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3291046.6666666665, ans=0.125 2023-11-28 01:28:33,380 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2023-11-28 01:28:38,170 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:28:38,896 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.095e+01 8.722e+01 9.347e+01 9.970e+01 1.370e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 01:28:40,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3291113.3333333335, ans=0.2 2023-11-28 01:28:51,645 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 700, loss[loss=0.04517, simple_loss=0.04988, pruned_loss=0.005726, audio_tagging_loss=0.0145, over 14203.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08934, pruned_loss=0.01229, audio_tagging_loss=0.009105, over 2955954.45 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:28:51,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3291180.0, ans=0.0 2023-11-28 01:28:55,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.18 vs. limit=22.5 2023-11-28 01:29:15,880 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493700 2023-11-28 01:29:22,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3291313.3333333335, ans=0.1 2023-11-28 01:29:40,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3291446.6666666665, ans=0.07 2023-11-28 01:29:43,100 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.09 vs. limit=15.0 2023-11-28 01:29:46,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3291446.6666666665, ans=0.2 2023-11-28 01:29:49,683 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 750, loss[loss=0.08091, simple_loss=0.1165, pruned_loss=0.01405, audio_tagging_loss=0.008626, over 15753.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08978, pruned_loss=0.01234, audio_tagging_loss=0.009003, over 2980736.01 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 4.0 2023-11-28 01:29:56,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3291513.3333333335, ans=0.125 2023-11-28 01:30:08,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3291580.0, ans=0.125 2023-11-28 01:30:13,891 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493750 2023-11-28 01:30:22,495 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.86 vs. limit=15.0 2023-11-28 01:30:33,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3291713.3333333335, ans=0.1 2023-11-28 01:30:36,020 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 8.598e+01 9.375e+01 9.953e+01 1.444e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-28 01:30:47,081 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 800, loss[loss=0.07217, simple_loss=0.103, pruned_loss=0.01326, audio_tagging_loss=0.007392, over 15511.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09001, pruned_loss=0.01238, audio_tagging_loss=0.009008, over 2999552.73 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:30:47,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3291846.6666666665, ans=0.2 2023-11-28 01:30:50,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3291846.6666666665, ans=0.0 2023-11-28 01:30:56,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3291846.6666666665, ans=0.2 2023-11-28 01:30:59,192 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2023-11-28 01:31:11,499 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493800 2023-11-28 01:31:35,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3292113.3333333335, ans=0.125 2023-11-28 01:31:45,319 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 850, loss[loss=0.07075, simple_loss=0.1086, pruned_loss=0.009774, audio_tagging_loss=0.006683, over 15351.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.0899, pruned_loss=0.01227, audio_tagging_loss=0.009119, over 3008443.68 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:32:01,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3292246.6666666665, ans=0.125 2023-11-28 01:32:06,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3292246.6666666665, ans=0.125 2023-11-28 01:32:09,996 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493850 2023-11-28 01:32:26,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3292380.0, ans=0.0 2023-11-28 01:32:31,569 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.796e+01 9.707e+01 1.029e+02 1.774e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-28 01:32:34,949 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.78 vs. limit=10.0 2023-11-28 01:32:43,118 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 900, loss[loss=0.05329, simple_loss=0.0756, pruned_loss=0.007301, audio_tagging_loss=0.008188, over 14848.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08923, pruned_loss=0.0123, audio_tagging_loss=0.009096, over 3015470.58 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:32:49,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3292513.3333333335, ans=0.1 2023-11-28 01:33:07,803 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493900 2023-11-28 01:33:16,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3292713.3333333335, ans=0.125 2023-11-28 01:33:41,069 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 950, loss[loss=0.06012, simple_loss=0.0831, pruned_loss=0.008727, audio_tagging_loss=0.009846, over 14033.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09021, pruned_loss=0.01263, audio_tagging_loss=0.008882, over 3022495.97 frames. ], batch size: 53, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:33:47,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3292846.6666666665, ans=0.1 2023-11-28 01:34:05,467 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493950 2023-11-28 01:34:07,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3292980.0, ans=0.1 2023-11-28 01:34:07,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3292980.0, ans=0.125 2023-11-28 01:34:07,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3292980.0, ans=0.2 2023-11-28 01:34:11,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3292980.0, ans=0.125 2023-11-28 01:34:27,166 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.099e+01 8.712e+01 9.268e+01 9.903e+01 1.259e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-28 01:34:36,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3293113.3333333335, ans=0.125 2023-11-28 01:34:38,911 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1000, loss[loss=0.06737, simple_loss=0.09151, pruned_loss=0.01346, audio_tagging_loss=0.008152, over 14897.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09029, pruned_loss=0.01275, audio_tagging_loss=0.008754, over 3024683.39 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:34:41,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.83 vs. limit=6.0 2023-11-28 01:34:44,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3293180.0, ans=0.1 2023-11-28 01:35:03,356 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494000 2023-11-28 01:35:03,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3293313.3333333335, ans=0.1 2023-11-28 01:35:05,800 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 01:35:23,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3293380.0, ans=0.125 2023-11-28 01:35:35,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3293513.3333333335, ans=0.125 2023-11-28 01:35:37,063 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1050, loss[loss=0.0685, simple_loss=0.101, pruned_loss=0.009579, audio_tagging_loss=0.00842, over 15729.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08994, pruned_loss=0.01252, audio_tagging_loss=0.008721, over 3028919.66 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:35:38,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3293513.3333333335, ans=0.0 2023-11-28 01:35:41,679 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.48 vs. limit=22.5 2023-11-28 01:35:45,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3293513.3333333335, ans=0.0 2023-11-28 01:35:50,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3293580.0, ans=0.0 2023-11-28 01:35:54,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3293580.0, ans=0.0 2023-11-28 01:35:59,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3293646.6666666665, ans=0.125 2023-11-28 01:36:01,759 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494050 2023-11-28 01:36:11,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3293713.3333333335, ans=0.07 2023-11-28 01:36:23,365 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.763e+01 8.510e+01 9.155e+01 1.003e+02 1.223e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-28 01:36:25,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3293780.0, ans=0.125 2023-11-28 01:36:35,322 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1100, loss[loss=0.05643, simple_loss=0.07889, pruned_loss=0.009034, audio_tagging_loss=0.007954, over 15113.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.089, pruned_loss=0.01241, audio_tagging_loss=0.008786, over 3024242.63 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:36:36,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3293846.6666666665, ans=0.09899494936611666 2023-11-28 01:36:39,881 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 01:36:55,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3293913.3333333335, ans=0.125 2023-11-28 01:36:58,976 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494100 2023-11-28 01:37:22,027 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.69 vs. limit=22.5 2023-11-28 01:37:32,950 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1150, loss[loss=0.05022, simple_loss=0.06399, pruned_loss=0.008295, audio_tagging_loss=0.009933, over 16225.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08922, pruned_loss=0.01258, audio_tagging_loss=0.008866, over 3033556.76 frames. ], batch size: 62, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:37:44,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3294246.6666666665, ans=0.125 2023-11-28 01:37:53,147 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.04 vs. limit=15.0 2023-11-28 01:37:57,727 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494150 2023-11-28 01:37:57,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3294313.3333333335, ans=0.125 2023-11-28 01:38:04,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3294313.3333333335, ans=0.125 2023-11-28 01:38:06,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3294380.0, ans=0.125 2023-11-28 01:38:18,797 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.809e+01 8.708e+01 9.340e+01 1.012e+02 1.442e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 01:38:29,979 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1200, loss[loss=0.07302, simple_loss=0.1016, pruned_loss=0.01381, audio_tagging_loss=0.008412, over 14226.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08873, pruned_loss=0.01256, audio_tagging_loss=0.008857, over 3030934.17 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:38:31,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3294513.3333333335, ans=0.125 2023-11-28 01:38:54,978 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494200 2023-11-28 01:39:16,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3294780.0, ans=0.1 2023-11-28 01:39:29,036 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1250, loss[loss=0.05297, simple_loss=0.06924, pruned_loss=0.008279, audio_tagging_loss=0.01007, over 15562.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08903, pruned_loss=0.01251, audio_tagging_loss=0.008877, over 3034052.39 frames. ], batch size: 60, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:39:40,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3294913.3333333335, ans=0.1 2023-11-28 01:39:52,931 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494250 2023-11-28 01:40:15,303 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 8.552e+01 9.215e+01 9.963e+01 1.305e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-28 01:40:19,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3295113.3333333335, ans=0.05 2023-11-28 01:40:26,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.81 vs. limit=15.0 2023-11-28 01:40:26,869 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1300, loss[loss=0.0503, simple_loss=0.07179, pruned_loss=0.007626, audio_tagging_loss=0.006776, over 15860.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08961, pruned_loss=0.01249, audio_tagging_loss=0.008779, over 3036187.98 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:40:38,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3295246.6666666665, ans=0.125 2023-11-28 01:40:50,502 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494300 2023-11-28 01:40:53,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3295313.3333333335, ans=0.0 2023-11-28 01:41:19,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3295446.6666666665, ans=0.125 2023-11-28 01:41:20,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3295446.6666666665, ans=0.125 2023-11-28 01:41:24,003 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1350, loss[loss=0.07593, simple_loss=0.09604, pruned_loss=0.01745, audio_tagging_loss=0.01046, over 14918.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09014, pruned_loss=0.01241, audio_tagging_loss=0.008826, over 3042968.91 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:41:48,858 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494350 2023-11-28 01:41:51,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3295646.6666666665, ans=0.2 2023-11-28 01:41:52,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3295646.6666666665, ans=0.0 2023-11-28 01:42:06,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3295713.3333333335, ans=0.2 2023-11-28 01:42:08,161 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 01:42:10,302 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.515e+01 9.138e+01 9.769e+01 1.555e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-28 01:42:12,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3295780.0, ans=0.125 2023-11-28 01:42:16,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3295780.0, ans=0.1 2023-11-28 01:42:22,292 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1400, loss[loss=0.06824, simple_loss=0.08905, pruned_loss=0.01354, audio_tagging_loss=0.01017, over 13887.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08958, pruned_loss=0.01228, audio_tagging_loss=0.008875, over 3040249.01 frames. ], batch size: 53, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:42:42,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3295913.3333333335, ans=0.125 2023-11-28 01:42:46,790 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494400 2023-11-28 01:43:20,765 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1450, loss[loss=0.07134, simple_loss=0.09474, pruned_loss=0.01454, audio_tagging_loss=0.009438, over 14949.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.0905, pruned_loss=0.01245, audio_tagging_loss=0.00888, over 3039631.99 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:43:40,034 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.55 vs. limit=10.0 2023-11-28 01:43:40,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.06 vs. limit=15.0 2023-11-28 01:43:44,177 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494450 2023-11-28 01:44:06,550 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.683e+01 8.990e+01 9.437e+01 1.012e+02 1.630e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 01:44:16,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3296513.3333333335, ans=0.0 2023-11-28 01:44:17,587 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1500, loss[loss=0.06334, simple_loss=0.08935, pruned_loss=0.01075, audio_tagging_loss=0.007915, over 15248.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.0915, pruned_loss=0.01264, audio_tagging_loss=0.008884, over 3050339.83 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:44:22,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3296513.3333333335, ans=0.125 2023-11-28 01:44:34,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3296580.0, ans=0.0 2023-11-28 01:44:42,291 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494500 2023-11-28 01:44:44,005 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.53 vs. limit=15.0 2023-11-28 01:45:04,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3296780.0, ans=0.125 2023-11-28 01:45:15,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3296846.6666666665, ans=0.0 2023-11-28 01:45:15,920 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1550, loss[loss=0.06777, simple_loss=0.08727, pruned_loss=0.01649, audio_tagging_loss=0.007648, over 14807.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09103, pruned_loss=0.01248, audio_tagging_loss=0.008903, over 3042913.58 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:45:24,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3296846.6666666665, ans=0.125 2023-11-28 01:45:38,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3296980.0, ans=0.125 2023-11-28 01:45:40,199 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494550 2023-11-28 01:45:51,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3297046.6666666665, ans=0.1 2023-11-28 01:46:01,874 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.675e+01 9.125e+01 9.756e+01 1.252e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-28 01:46:03,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3297113.3333333335, ans=0.0 2023-11-28 01:46:03,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3297113.3333333335, ans=0.0 2023-11-28 01:46:09,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3297113.3333333335, ans=0.125 2023-11-28 01:46:13,997 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1600, loss[loss=0.05394, simple_loss=0.07733, pruned_loss=0.007132, audio_tagging_loss=0.008141, over 15495.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09076, pruned_loss=0.01241, audio_tagging_loss=0.008923, over 3045230.12 frames. ], batch size: 60, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:46:35,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3297313.3333333335, ans=0.125 2023-11-28 01:46:37,077 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494600 2023-11-28 01:47:05,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3297446.6666666665, ans=0.125 2023-11-28 01:47:08,814 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.97 vs. limit=22.5 2023-11-28 01:47:10,408 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1650, loss[loss=0.07342, simple_loss=0.1002, pruned_loss=0.01444, audio_tagging_loss=0.008888, over 15757.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09037, pruned_loss=0.01243, audio_tagging_loss=0.008954, over 3043906.83 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:47:24,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3297580.0, ans=0.125 2023-11-28 01:47:32,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3297646.6666666665, ans=0.0 2023-11-28 01:47:34,430 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494650 2023-11-28 01:47:57,551 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.801e+01 9.426e+01 1.009e+02 1.226e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-28 01:48:08,366 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1700, loss[loss=0.07164, simple_loss=0.09493, pruned_loss=0.01296, audio_tagging_loss=0.01122, over 14542.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09057, pruned_loss=0.01253, audio_tagging_loss=0.009016, over 3052004.43 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:48:32,335 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494700 2023-11-28 01:48:45,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3298046.6666666665, ans=0.125 2023-11-28 01:48:54,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2023-11-28 01:48:57,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3298113.3333333335, ans=0.125 2023-11-28 01:49:04,880 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1750, loss[loss=0.07832, simple_loss=0.1196, pruned_loss=0.01284, audio_tagging_loss=0.005663, over 15995.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08981, pruned_loss=0.01239, audio_tagging_loss=0.009019, over 3061268.24 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:49:05,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3298180.0, ans=0.0 2023-11-28 01:49:12,762 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:49:16,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3298246.6666666665, ans=0.05 2023-11-28 01:49:19,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3298246.6666666665, ans=0.125 2023-11-28 01:49:29,070 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494750 2023-11-28 01:49:30,675 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.23 vs. limit=6.0 2023-11-28 01:49:33,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.82 vs. limit=15.0 2023-11-28 01:49:36,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3298313.3333333335, ans=0.125 2023-11-28 01:49:43,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3298380.0, ans=0.125 2023-11-28 01:49:47,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3298380.0, ans=0.125 2023-11-28 01:49:52,447 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 8.816e+01 9.529e+01 1.029e+02 1.383e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 01:50:02,961 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1800, loss[loss=0.04989, simple_loss=0.06221, pruned_loss=0.009212, audio_tagging_loss=0.009568, over 14772.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08959, pruned_loss=0.01234, audio_tagging_loss=0.008992, over 3055653.40 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:50:08,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3298513.3333333335, ans=0.1 2023-11-28 01:50:14,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3298580.0, ans=0.0 2023-11-28 01:50:26,980 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494800 2023-11-28 01:50:27,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3298646.6666666665, ans=0.2 2023-11-28 01:50:29,921 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.19 vs. limit=15.0 2023-11-28 01:50:41,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3298713.3333333335, ans=0.0 2023-11-28 01:50:59,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3298846.6666666665, ans=0.125 2023-11-28 01:51:00,591 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1850, loss[loss=0.06553, simple_loss=0.09126, pruned_loss=0.01088, audio_tagging_loss=0.009012, over 15375.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09012, pruned_loss=0.0125, audio_tagging_loss=0.008898, over 3059149.37 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:51:02,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3298846.6666666665, ans=0.0 2023-11-28 01:51:08,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3298846.6666666665, ans=0.0 2023-11-28 01:51:11,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3298913.3333333335, ans=0.2 2023-11-28 01:51:25,223 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494850 2023-11-28 01:51:25,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3298980.0, ans=0.2 2023-11-28 01:51:43,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3299046.6666666665, ans=0.125 2023-11-28 01:51:44,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3299046.6666666665, ans=0.125 2023-11-28 01:51:47,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3299113.3333333335, ans=0.125 2023-11-28 01:51:48,562 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 8.704e+01 9.342e+01 1.015e+02 1.516e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 01:51:58,705 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1900, loss[loss=0.07333, simple_loss=0.1084, pruned_loss=0.0106, audio_tagging_loss=0.008505, over 15462.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08984, pruned_loss=0.01234, audio_tagging_loss=0.008795, over 3061579.64 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:52:04,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3299180.0, ans=0.0 2023-11-28 01:52:07,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3299180.0, ans=0.125 2023-11-28 01:52:12,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3299246.6666666665, ans=0.125 2023-11-28 01:52:15,396 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:52:19,924 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:52:22,953 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494900 2023-11-28 01:52:25,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=15.0 2023-11-28 01:52:56,210 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1950, loss[loss=0.06553, simple_loss=0.09023, pruned_loss=0.0129, audio_tagging_loss=0.00751, over 15524.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08971, pruned_loss=0.01229, audio_tagging_loss=0.008781, over 3058168.77 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:53:18,063 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.20 vs. limit=10.0 2023-11-28 01:53:20,621 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494950 2023-11-28 01:53:39,470 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.86 vs. limit=15.0 2023-11-28 01:53:40,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3299713.3333333335, ans=0.125 2023-11-28 01:53:43,772 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.727e+01 9.410e+01 1.013e+02 1.318e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 01:53:53,607 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2000, loss[loss=0.07379, simple_loss=0.0995, pruned_loss=0.01397, audio_tagging_loss=0.01008, over 14897.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08938, pruned_loss=0.01226, audio_tagging_loss=0.008789, over 3055521.39 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:54:08,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3299913.3333333335, ans=0.125 2023-11-28 01:54:17,875 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495000 2023-11-28 01:54:26,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3299980.0, ans=0.0 2023-11-28 01:54:29,214 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.73 vs. limit=12.0 2023-11-28 01:54:39,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3300113.3333333335, ans=0.1 2023-11-28 01:54:43,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3300113.3333333335, ans=0.125 2023-11-28 01:54:51,448 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2050, loss[loss=0.05264, simple_loss=0.0706, pruned_loss=0.007433, audio_tagging_loss=0.009902, over 15675.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08887, pruned_loss=0.01229, audio_tagging_loss=0.008805, over 3046864.16 frames. ], batch size: 60, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:54:56,329 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.43 vs. limit=22.5 2023-11-28 01:55:15,648 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495050 2023-11-28 01:55:29,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3300380.0, ans=0.125 2023-11-28 01:55:37,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3300446.6666666665, ans=0.125 2023-11-28 01:55:38,626 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.597e+01 8.786e+01 9.334e+01 1.004e+02 1.293e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-28 01:55:41,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3300446.6666666665, ans=0.07 2023-11-28 01:55:46,398 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=12.0 2023-11-28 01:55:49,313 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2100, loss[loss=0.06558, simple_loss=0.09709, pruned_loss=0.008847, audio_tagging_loss=0.00819, over 14363.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08872, pruned_loss=0.01237, audio_tagging_loss=0.008793, over 3045201.74 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:55:52,275 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.39 vs. limit=12.0 2023-11-28 01:55:53,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3300513.3333333335, ans=0.125 2023-11-28 01:56:14,173 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495100 2023-11-28 01:56:15,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3300646.6666666665, ans=0.125 2023-11-28 01:56:35,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3300780.0, ans=0.0 2023-11-28 01:56:39,595 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.40 vs. limit=22.5 2023-11-28 01:56:44,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3300780.0, ans=0.0 2023-11-28 01:56:47,191 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2150, loss[loss=0.05636, simple_loss=0.07312, pruned_loss=0.01037, audio_tagging_loss=0.009432, over 15065.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08889, pruned_loss=0.01229, audio_tagging_loss=0.008825, over 3054032.19 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:56:47,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3300846.6666666665, ans=0.1 2023-11-28 01:56:51,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3300846.6666666665, ans=0.125 2023-11-28 01:57:11,879 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495150 2023-11-28 01:57:20,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3301046.6666666665, ans=0.125 2023-11-28 01:57:22,691 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 01:57:26,522 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=15.0 2023-11-28 01:57:31,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3301046.6666666665, ans=0.0 2023-11-28 01:57:34,208 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.726e+01 9.403e+01 1.016e+02 1.279e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-28 01:57:42,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3301113.3333333335, ans=0.0 2023-11-28 01:57:45,123 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2200, loss[loss=0.05615, simple_loss=0.07536, pruned_loss=0.01039, audio_tagging_loss=0.008078, over 14713.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09032, pruned_loss=0.01249, audio_tagging_loss=0.008758, over 3049715.87 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:57:47,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3301180.0, ans=0.0 2023-11-28 01:57:53,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3301180.0, ans=0.2 2023-11-28 01:57:54,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3301180.0, ans=0.1 2023-11-28 01:57:57,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.05 vs. limit=12.0 2023-11-28 01:57:59,379 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.57 vs. limit=22.5 2023-11-28 01:58:04,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3301246.6666666665, ans=0.125 2023-11-28 01:58:08,881 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495200 2023-11-28 01:58:16,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3301313.3333333335, ans=0.125 2023-11-28 01:58:22,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.85 vs. limit=22.5 2023-11-28 01:58:25,326 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.46 vs. limit=22.5 2023-11-28 01:58:35,086 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.76 vs. limit=10.0 2023-11-28 01:58:43,006 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2250, loss[loss=0.07023, simple_loss=0.101, pruned_loss=0.01299, audio_tagging_loss=0.006733, over 16040.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09092, pruned_loss=0.01255, audio_tagging_loss=0.008738, over 3049423.30 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 01:59:02,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3301580.0, ans=0.1 2023-11-28 01:59:07,473 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495250 2023-11-28 01:59:29,983 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.696e+01 9.309e+01 9.993e+01 1.259e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-28 01:59:39,879 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2300, loss[loss=0.04558, simple_loss=0.06103, pruned_loss=0.006005, audio_tagging_loss=0.009055, over 14518.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09002, pruned_loss=0.01238, audio_tagging_loss=0.00888, over 3046850.48 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 01:59:41,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3301846.6666666665, ans=0.125 2023-11-28 01:59:51,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3301913.3333333335, ans=0.2 2023-11-28 02:00:04,567 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495300 2023-11-28 02:00:17,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3302046.6666666665, ans=0.125 2023-11-28 02:00:20,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3302046.6666666665, ans=0.0 2023-11-28 02:00:25,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3302113.3333333335, ans=0.04949747468305833 2023-11-28 02:00:32,690 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:00:38,607 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2350, loss[loss=0.07713, simple_loss=0.1064, pruned_loss=0.01646, audio_tagging_loss=0.007468, over 14848.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09059, pruned_loss=0.01252, audio_tagging_loss=0.008833, over 3041772.18 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:00:41,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3302180.0, ans=0.125 2023-11-28 02:00:41,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3302180.0, ans=0.125 2023-11-28 02:00:41,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3302180.0, ans=0.015 2023-11-28 02:00:45,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3302180.0, ans=0.1 2023-11-28 02:00:51,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3302246.6666666665, ans=0.1 2023-11-28 02:00:52,071 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.43 vs. limit=6.0 2023-11-28 02:00:56,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.39 vs. limit=15.0 2023-11-28 02:01:02,404 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495350 2023-11-28 02:01:13,056 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.46 vs. limit=12.0 2023-11-28 02:01:21,288 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:01:25,421 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.786e+01 8.943e+01 9.346e+01 1.021e+02 1.230e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 02:01:29,515 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.93 vs. limit=10.0 2023-11-28 02:01:36,076 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2400, loss[loss=0.05867, simple_loss=0.08639, pruned_loss=0.00627, audio_tagging_loss=0.009208, over 14711.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09121, pruned_loss=0.01263, audio_tagging_loss=0.008918, over 3038848.32 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:01:38,729 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=15.0 2023-11-28 02:01:51,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3302580.0, ans=0.2 2023-11-28 02:01:59,837 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495400 2023-11-28 02:02:00,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3302646.6666666665, ans=0.0 2023-11-28 02:02:15,403 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.97 vs. limit=12.0 2023-11-28 02:02:23,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3302780.0, ans=0.125 2023-11-28 02:02:29,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3302780.0, ans=0.0 2023-11-28 02:02:32,977 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2450, loss[loss=0.06829, simple_loss=0.09518, pruned_loss=0.009276, audio_tagging_loss=0.01142, over 16625.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09054, pruned_loss=0.01241, audio_tagging_loss=0.008966, over 3044646.53 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:02:38,884 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2023-11-28 02:02:46,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3302913.3333333335, ans=0.125 2023-11-28 02:02:57,465 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495450 2023-11-28 02:02:59,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3302980.0, ans=0.0 2023-11-28 02:03:21,012 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.760e+01 9.295e+01 9.959e+01 1.249e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-28 02:03:31,360 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2500, loss[loss=0.04979, simple_loss=0.06811, pruned_loss=0.005365, audio_tagging_loss=0.01037, over 15459.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.08997, pruned_loss=0.01244, audio_tagging_loss=0.009023, over 3043914.56 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:03:33,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3303180.0, ans=0.05 2023-11-28 02:03:39,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3303180.0, ans=0.2 2023-11-28 02:03:47,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3303246.6666666665, ans=0.0 2023-11-28 02:03:48,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2023-11-28 02:03:51,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3303246.6666666665, ans=0.2 2023-11-28 02:03:54,998 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495500 2023-11-28 02:04:15,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3303380.0, ans=0.125 2023-11-28 02:04:23,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=22.5 2023-11-28 02:04:28,461 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2550, loss[loss=0.06624, simple_loss=0.0931, pruned_loss=0.01029, audio_tagging_loss=0.009401, over 15647.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08847, pruned_loss=0.01227, audio_tagging_loss=0.008983, over 3043312.16 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:04:31,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3303513.3333333335, ans=0.125 2023-11-28 02:04:36,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3303513.3333333335, ans=0.1 2023-11-28 02:04:52,421 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495550 2023-11-28 02:05:17,230 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.254e+01 8.589e+01 9.196e+01 9.860e+01 1.420e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-28 02:05:20,010 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=22.5 2023-11-28 02:05:26,166 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2600, loss[loss=0.0686, simple_loss=0.0986, pruned_loss=0.01154, audio_tagging_loss=0.007756, over 15708.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08834, pruned_loss=0.01229, audio_tagging_loss=0.008836, over 3044757.26 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:05:42,203 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.62 vs. limit=22.5 2023-11-28 02:05:50,415 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.67 vs. limit=10.0 2023-11-28 02:05:50,788 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495600 2023-11-28 02:06:01,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3304046.6666666665, ans=15.0 2023-11-28 02:06:07,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3304046.6666666665, ans=0.035 2023-11-28 02:06:11,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3304113.3333333335, ans=0.1 2023-11-28 02:06:24,278 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2650, loss[loss=0.06351, simple_loss=0.08935, pruned_loss=0.01055, audio_tagging_loss=0.008288, over 15534.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08911, pruned_loss=0.01238, audio_tagging_loss=0.008763, over 3047835.52 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:06:24,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3304180.0, ans=0.0 2023-11-28 02:06:28,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3304180.0, ans=0.125 2023-11-28 02:06:30,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3304180.0, ans=0.0 2023-11-28 02:06:42,854 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.08 vs. limit=15.0 2023-11-28 02:06:48,585 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495650 2023-11-28 02:06:48,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3304313.3333333335, ans=0.0 2023-11-28 02:07:00,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3304380.0, ans=0.07 2023-11-28 02:07:12,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3304446.6666666665, ans=0.125 2023-11-28 02:07:13,637 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 8.568e+01 9.215e+01 1.005e+02 1.316e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-28 02:07:18,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3304446.6666666665, ans=0.1 2023-11-28 02:07:21,990 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2700, loss[loss=0.06539, simple_loss=0.09723, pruned_loss=0.007308, audio_tagging_loss=0.009466, over 15006.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08982, pruned_loss=0.01243, audio_tagging_loss=0.0086, over 3050769.42 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:07:32,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3304580.0, ans=0.125 2023-11-28 02:07:45,792 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495700 2023-11-28 02:07:45,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3304646.6666666665, ans=0.1 2023-11-28 02:07:53,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3304646.6666666665, ans=0.0 2023-11-28 02:07:54,974 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.93 vs. limit=22.5 2023-11-28 02:08:19,875 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2750, loss[loss=0.06575, simple_loss=0.09815, pruned_loss=0.00981, audio_tagging_loss=0.00686, over 15636.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08854, pruned_loss=0.01217, audio_tagging_loss=0.008635, over 3052495.62 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:08:21,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3304846.6666666665, ans=0.125 2023-11-28 02:08:27,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3304846.6666666665, ans=0.0 2023-11-28 02:08:29,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3304913.3333333335, ans=0.2 2023-11-28 02:08:39,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3304913.3333333335, ans=0.125 2023-11-28 02:08:43,923 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495750 2023-11-28 02:08:46,830 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:09:08,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3305113.3333333335, ans=0.125 2023-11-28 02:09:09,421 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 8.832e+01 9.441e+01 1.011e+02 1.289e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 02:09:10,583 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:09:17,085 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2800, loss[loss=0.06598, simple_loss=0.08972, pruned_loss=0.01008, audio_tagging_loss=0.01104, over 14956.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08894, pruned_loss=0.01224, audio_tagging_loss=0.008685, over 3054320.07 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:09:22,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3305180.0, ans=0.0 2023-11-28 02:09:24,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2023-11-28 02:09:29,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3305246.6666666665, ans=0.0 2023-11-28 02:09:42,263 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495800 2023-11-28 02:09:48,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3305313.3333333335, ans=0.05 2023-11-28 02:09:48,693 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2023-11-28 02:09:51,862 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.97 vs. limit=15.0 2023-11-28 02:09:56,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3305380.0, ans=0.1 2023-11-28 02:10:14,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3305513.3333333335, ans=0.125 2023-11-28 02:10:15,078 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2850, loss[loss=0.06573, simple_loss=0.08444, pruned_loss=0.01428, audio_tagging_loss=0.009229, over 16031.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09004, pruned_loss=0.01253, audio_tagging_loss=0.00854, over 3052353.50 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:10:16,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3305513.3333333335, ans=0.0 2023-11-28 02:10:23,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3305513.3333333335, ans=0.0 2023-11-28 02:10:24,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3305513.3333333335, ans=0.2 2023-11-28 02:10:37,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3305646.6666666665, ans=0.1 2023-11-28 02:10:39,411 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495850 2023-11-28 02:11:04,936 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.806e+01 9.389e+01 1.005e+02 1.417e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 02:11:12,658 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2900, loss[loss=0.0747, simple_loss=0.1043, pruned_loss=0.01418, audio_tagging_loss=0.008389, over 16044.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08926, pruned_loss=0.0124, audio_tagging_loss=0.008584, over 3050243.05 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:11:24,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3305913.3333333335, ans=0.2 2023-11-28 02:11:35,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2023-11-28 02:11:36,739 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495900 2023-11-28 02:11:40,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3305980.0, ans=0.1 2023-11-28 02:11:57,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.88 vs. limit=10.0 2023-11-28 02:11:57,484 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.20 vs. limit=22.5 2023-11-28 02:12:04,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3306113.3333333335, ans=15.0 2023-11-28 02:12:09,959 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2950, loss[loss=0.05673, simple_loss=0.07375, pruned_loss=0.01091, audio_tagging_loss=0.00894, over 14714.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09007, pruned_loss=0.01248, audio_tagging_loss=0.008598, over 3051011.57 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:12:11,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3306180.0, ans=0.125 2023-11-28 02:12:20,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3306246.6666666665, ans=0.125 2023-11-28 02:12:34,938 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495950 2023-11-28 02:12:37,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3306313.3333333335, ans=0.125 2023-11-28 02:12:49,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3306380.0, ans=0.0 2023-11-28 02:13:01,049 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.879e+01 8.902e+01 9.555e+01 1.025e+02 1.277e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 02:13:07,696 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3000, loss[loss=0.07608, simple_loss=0.1004, pruned_loss=0.017, audio_tagging_loss=0.008876, over 15728.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08997, pruned_loss=0.01249, audio_tagging_loss=0.008696, over 3051225.82 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:13:07,698 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 02:13:42,054 INFO [train_asr.py:1267] (0/4) Epoch 42, validation: loss=0.05767, simple_loss=0.05061, pruned_loss=0.005183, audio_tagging_loss=0.02719, over 4681554.00 frames. 2023-11-28 02:13:42,055 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 02:14:02,900 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.35 vs. limit=12.0 2023-11-28 02:14:04,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3306646.6666666665, ans=0.2 2023-11-28 02:14:05,690 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496000 2023-11-28 02:14:07,058 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-496000.pt 2023-11-28 02:14:20,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3306713.3333333335, ans=0.0 2023-11-28 02:14:42,114 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3050, loss[loss=0.06403, simple_loss=0.09091, pruned_loss=0.009067, audio_tagging_loss=0.009506, over 15287.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09059, pruned_loss=0.01251, audio_tagging_loss=0.008713, over 3052135.58 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:14:42,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3306846.6666666665, ans=0.0 2023-11-28 02:14:52,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3306913.3333333335, ans=0.125 2023-11-28 02:15:01,677 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2023-11-28 02:15:05,683 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496050 2023-11-28 02:15:06,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3306980.0, ans=0.0 2023-11-28 02:15:16,136 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:15:22,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3307046.6666666665, ans=0.1 2023-11-28 02:15:24,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3307046.6666666665, ans=15.0 2023-11-28 02:15:32,503 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.076e+01 8.885e+01 9.431e+01 1.019e+02 1.276e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 02:15:39,159 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3100, loss[loss=0.05802, simple_loss=0.07099, pruned_loss=0.0111, audio_tagging_loss=0.01143, over 14081.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09017, pruned_loss=0.01243, audio_tagging_loss=0.008745, over 3047125.51 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:15:41,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3307180.0, ans=0.125 2023-11-28 02:16:02,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3307313.3333333335, ans=0.0 2023-11-28 02:16:03,465 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496100 2023-11-28 02:16:06,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3307313.3333333335, ans=0.125 2023-11-28 02:16:28,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=3307446.6666666665, ans=12.0 2023-11-28 02:16:36,640 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3150, loss[loss=0.05514, simple_loss=0.07094, pruned_loss=0.01212, audio_tagging_loss=0.007548, over 15402.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09075, pruned_loss=0.01241, audio_tagging_loss=0.00874, over 3045564.83 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:16:47,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2023-11-28 02:16:50,653 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:17:01,143 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496150 2023-11-28 02:17:26,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3307780.0, ans=0.0 2023-11-28 02:17:27,307 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.839e+01 8.803e+01 9.436e+01 1.005e+02 1.293e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 02:17:27,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3307780.0, ans=0.2 2023-11-28 02:17:33,937 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3200, loss[loss=0.07328, simple_loss=0.1005, pruned_loss=0.01405, audio_tagging_loss=0.008969, over 15508.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09134, pruned_loss=0.01251, audio_tagging_loss=0.008791, over 3052821.32 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:17:40,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3307846.6666666665, ans=0.0 2023-11-28 02:17:52,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3307913.3333333335, ans=0.125 2023-11-28 02:17:55,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2023-11-28 02:17:56,030 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.21 vs. limit=12.0 2023-11-28 02:17:58,654 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496200 2023-11-28 02:18:04,906 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2023-11-28 02:18:06,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3307980.0, ans=0.2 2023-11-28 02:18:15,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3308046.6666666665, ans=0.125 2023-11-28 02:18:17,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3308046.6666666665, ans=0.125 2023-11-28 02:18:32,210 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3250, loss[loss=0.07968, simple_loss=0.115, pruned_loss=0.01525, audio_tagging_loss=0.006916, over 16009.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09032, pruned_loss=0.01249, audio_tagging_loss=0.008897, over 3053819.42 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:18:56,699 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496250 2023-11-28 02:19:15,619 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.32 vs. limit=15.0 2023-11-28 02:19:23,286 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.753e+01 9.382e+01 9.909e+01 1.200e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 02:19:27,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3308446.6666666665, ans=0.125 2023-11-28 02:19:29,809 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3300, loss[loss=0.06444, simple_loss=0.0921, pruned_loss=0.01044, audio_tagging_loss=0.007949, over 15638.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.0897, pruned_loss=0.01243, audio_tagging_loss=0.009018, over 3052943.79 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:19:32,586 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=15.0 2023-11-28 02:19:40,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3308580.0, ans=0.1 2023-11-28 02:19:54,724 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496300 2023-11-28 02:20:12,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3308713.3333333335, ans=0.0 2023-11-28 02:20:24,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3308780.0, ans=0.2 2023-11-28 02:20:26,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3308780.0, ans=0.0 2023-11-28 02:20:28,045 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3350, loss[loss=0.07537, simple_loss=0.1056, pruned_loss=0.0139, audio_tagging_loss=0.008661, over 15084.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.08999, pruned_loss=0.01254, audio_tagging_loss=0.008908, over 3045654.60 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:20:37,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2023-11-28 02:20:43,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3308913.3333333335, ans=0.0 2023-11-28 02:20:52,566 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496350 2023-11-28 02:21:10,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3309046.6666666665, ans=0.07 2023-11-28 02:21:19,219 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.593e+01 8.880e+01 9.434e+01 1.005e+02 1.295e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 02:21:25,802 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3400, loss[loss=0.09059, simple_loss=0.1276, pruned_loss=0.01674, audio_tagging_loss=0.01006, over 15193.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08987, pruned_loss=0.01252, audio_tagging_loss=0.008821, over 3048989.58 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:21:26,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2023-11-28 02:21:29,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3309180.0, ans=0.2 2023-11-28 02:21:32,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3309180.0, ans=0.125 2023-11-28 02:21:49,513 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496400 2023-11-28 02:21:53,728 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.47 vs. limit=22.5 2023-11-28 02:22:01,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3309380.0, ans=0.125 2023-11-28 02:22:02,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3309380.0, ans=0.05 2023-11-28 02:22:11,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3309446.6666666665, ans=0.125 2023-11-28 02:22:15,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3309446.6666666665, ans=0.125 2023-11-28 02:22:23,523 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3450, loss[loss=0.07206, simple_loss=0.1032, pruned_loss=0.01466, audio_tagging_loss=0.005812, over 15575.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08977, pruned_loss=0.01233, audio_tagging_loss=0.00865, over 3049506.98 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:22:33,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3309580.0, ans=0.0 2023-11-28 02:22:40,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3309580.0, ans=0.1 2023-11-28 02:22:48,158 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496450 2023-11-28 02:23:01,701 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.73 vs. limit=6.0 2023-11-28 02:23:13,826 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.800e+01 9.317e+01 1.017e+02 1.307e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-28 02:23:14,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3309780.0, ans=0.125 2023-11-28 02:23:15,646 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2023-11-28 02:23:20,398 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3500, loss[loss=0.0838, simple_loss=0.1211, pruned_loss=0.01636, audio_tagging_loss=0.006896, over 15324.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08996, pruned_loss=0.01238, audio_tagging_loss=0.008643, over 3054216.07 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:23:23,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3309846.6666666665, ans=0.1 2023-11-28 02:23:25,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3309846.6666666665, ans=0.125 2023-11-28 02:23:44,983 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496500 2023-11-28 02:23:52,040 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:24:18,478 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3550, loss[loss=0.06455, simple_loss=0.08561, pruned_loss=0.01293, audio_tagging_loss=0.00881, over 16554.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08968, pruned_loss=0.01225, audio_tagging_loss=0.008643, over 3056209.29 frames. ], batch size: 64, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:24:19,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3310180.0, ans=0.1 2023-11-28 02:24:23,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3310180.0, ans=0.125 2023-11-28 02:24:37,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3310246.6666666665, ans=0.125 2023-11-28 02:24:40,730 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2023-11-28 02:24:42,243 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496550 2023-11-28 02:24:52,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3310380.0, ans=0.125 2023-11-28 02:25:08,570 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.495e+01 8.721e+01 9.297e+01 1.018e+02 1.196e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-28 02:25:10,110 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:25:10,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3310446.6666666665, ans=0.1 2023-11-28 02:25:14,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3310513.3333333335, ans=0.07 2023-11-28 02:25:15,742 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3600, loss[loss=0.06193, simple_loss=0.07819, pruned_loss=0.01292, audio_tagging_loss=0.009919, over 14746.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08883, pruned_loss=0.01213, audio_tagging_loss=0.008597, over 3053644.08 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:25:39,230 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496600 2023-11-28 02:25:46,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3310646.6666666665, ans=0.125 2023-11-28 02:25:50,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3310713.3333333335, ans=0.2 2023-11-28 02:25:57,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3310713.3333333335, ans=0.0 2023-11-28 02:26:12,313 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3650, loss[loss=0.07388, simple_loss=0.1021, pruned_loss=0.01624, audio_tagging_loss=0.006599, over 15240.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08964, pruned_loss=0.01226, audio_tagging_loss=0.008569, over 3048904.26 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:26:25,103 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.95 vs. limit=15.0 2023-11-28 02:26:36,648 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496650 2023-11-28 02:26:45,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3310980.0, ans=0.0 2023-11-28 02:27:03,703 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.057e+01 8.594e+01 9.332e+01 1.003e+02 1.328e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-28 02:27:09,744 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3700, loss[loss=0.04835, simple_loss=0.05761, pruned_loss=0.008655, audio_tagging_loss=0.01089, over 17112.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08987, pruned_loss=0.01226, audio_tagging_loss=0.00863, over 3050247.14 frames. ], batch size: 65, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:27:18,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3311180.0, ans=0.125 2023-11-28 02:27:21,447 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:27:34,022 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496700 2023-11-28 02:28:06,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3311513.3333333335, ans=0.0 2023-11-28 02:28:07,639 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3750, loss[loss=0.0719, simple_loss=0.09416, pruned_loss=0.01337, audio_tagging_loss=0.01145, over 16362.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09035, pruned_loss=0.01222, audio_tagging_loss=0.008646, over 3052957.93 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:28:12,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3311513.3333333335, ans=0.05 2023-11-28 02:28:21,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3311580.0, ans=0.125 2023-11-28 02:28:25,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3311580.0, ans=0.125 2023-11-28 02:28:30,791 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496750 2023-11-28 02:28:47,248 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:28:58,671 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 8.781e+01 9.240e+01 9.958e+01 1.596e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-28 02:29:00,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3311780.0, ans=0.125 2023-11-28 02:29:03,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3311846.6666666665, ans=0.125 2023-11-28 02:29:04,353 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3800, loss[loss=0.05944, simple_loss=0.07297, pruned_loss=0.01015, audio_tagging_loss=0.01281, over 15013.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09051, pruned_loss=0.01234, audio_tagging_loss=0.008771, over 3055487.69 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:29:12,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3311846.6666666665, ans=0.125 2023-11-28 02:29:27,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.22 vs. limit=22.5 2023-11-28 02:29:27,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3311980.0, ans=0.125 2023-11-28 02:29:28,746 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496800 2023-11-28 02:29:33,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3311980.0, ans=0.125 2023-11-28 02:29:41,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3312046.6666666665, ans=0.125 2023-11-28 02:29:50,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.13 vs. limit=15.0 2023-11-28 02:30:01,647 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3850, loss[loss=0.0714, simple_loss=0.09985, pruned_loss=0.01446, audio_tagging_loss=0.007013, over 14769.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.0906, pruned_loss=0.01224, audio_tagging_loss=0.008827, over 3052882.51 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:30:10,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3312180.0, ans=0.1 2023-11-28 02:30:26,019 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496850 2023-11-28 02:30:29,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3312313.3333333335, ans=0.2 2023-11-28 02:30:35,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3312380.0, ans=0.125 2023-11-28 02:30:45,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3312380.0, ans=0.125 2023-11-28 02:30:53,142 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.776e+01 8.894e+01 9.500e+01 1.019e+02 1.780e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 02:30:57,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3312446.6666666665, ans=0.1 2023-11-28 02:30:59,261 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3900, loss[loss=0.06918, simple_loss=0.09752, pruned_loss=0.01289, audio_tagging_loss=0.007527, over 15862.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09015, pruned_loss=0.01228, audio_tagging_loss=0.008859, over 3047967.93 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:31:01,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3312513.3333333335, ans=0.125 2023-11-28 02:31:04,879 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.82 vs. limit=15.0 2023-11-28 02:31:22,287 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.28 vs. limit=15.0 2023-11-28 02:31:22,906 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496900 2023-11-28 02:31:23,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3312646.6666666665, ans=0.1 2023-11-28 02:31:33,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3312713.3333333335, ans=0.125 2023-11-28 02:31:49,401 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.44 vs. limit=10.0 2023-11-28 02:31:56,454 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3950, loss[loss=0.05035, simple_loss=0.06651, pruned_loss=0.007543, audio_tagging_loss=0.009548, over 15170.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.0903, pruned_loss=0.01227, audio_tagging_loss=0.008939, over 3049361.01 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:31:56,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.21 vs. limit=15.0 2023-11-28 02:32:02,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3312846.6666666665, ans=0.1 2023-11-28 02:32:04,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3312846.6666666665, ans=0.04949747468305833 2023-11-28 02:32:05,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3312846.6666666665, ans=0.0 2023-11-28 02:32:11,685 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.22 vs. limit=12.0 2023-11-28 02:32:17,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3312980.0, ans=0.125 2023-11-28 02:32:19,808 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496950 2023-11-28 02:32:26,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3312980.0, ans=0.125 2023-11-28 02:32:37,704 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=15.0 2023-11-28 02:32:46,855 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.773e+01 8.841e+01 9.484e+01 1.039e+02 1.407e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 02:32:52,900 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4000, loss[loss=0.06571, simple_loss=0.08696, pruned_loss=0.01351, audio_tagging_loss=0.008718, over 14551.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09078, pruned_loss=0.01232, audio_tagging_loss=0.008973, over 3045453.21 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:32:53,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3313180.0, ans=0.125 2023-11-28 02:33:06,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3313246.6666666665, ans=0.125 2023-11-28 02:33:06,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3313246.6666666665, ans=0.125 2023-11-28 02:33:17,212 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497000 2023-11-28 02:33:21,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3313313.3333333335, ans=0.125 2023-11-28 02:33:36,011 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2023-11-28 02:33:49,966 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4050, loss[loss=0.06842, simple_loss=0.09367, pruned_loss=0.01287, audio_tagging_loss=0.008725, over 14635.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09112, pruned_loss=0.01246, audio_tagging_loss=0.00895, over 3049275.15 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:33:53,192 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:33:54,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3313513.3333333335, ans=0.05 2023-11-28 02:34:03,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3313580.0, ans=0.0 2023-11-28 02:34:09,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=12.0 2023-11-28 02:34:14,343 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497050 2023-11-28 02:34:16,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3313646.6666666665, ans=0.0 2023-11-28 02:34:35,130 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.22 vs. limit=12.0 2023-11-28 02:34:42,831 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.897e+01 8.798e+01 9.309e+01 1.006e+02 1.878e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-28 02:34:47,163 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4100, loss[loss=0.05717, simple_loss=0.07306, pruned_loss=0.01201, audio_tagging_loss=0.008632, over 14087.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09127, pruned_loss=0.01259, audio_tagging_loss=0.008924, over 3053985.52 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:34:47,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-28 02:34:51,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3313846.6666666665, ans=0.0 2023-11-28 02:35:01,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3313913.3333333335, ans=0.0 2023-11-28 02:35:04,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3313913.3333333335, ans=0.1 2023-11-28 02:35:07,266 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.33 vs. limit=15.0 2023-11-28 02:35:10,921 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497100 2023-11-28 02:35:17,763 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:35:33,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3314113.3333333335, ans=0.1 2023-11-28 02:35:38,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3314113.3333333335, ans=0.125 2023-11-28 02:35:39,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3314113.3333333335, ans=0.125 2023-11-28 02:35:44,010 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4150, loss[loss=0.06563, simple_loss=0.08885, pruned_loss=0.01001, audio_tagging_loss=0.01119, over 15383.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09064, pruned_loss=0.01237, audio_tagging_loss=0.008866, over 3057697.90 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:35:44,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3314180.0, ans=0.0 2023-11-28 02:35:57,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3314246.6666666665, ans=0.125 2023-11-28 02:36:08,941 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497150 2023-11-28 02:36:20,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3314380.0, ans=0.04949747468305833 2023-11-28 02:36:21,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3314380.0, ans=0.2 2023-11-28 02:36:26,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3314380.0, ans=0.07 2023-11-28 02:36:27,072 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:36:27,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3314380.0, ans=0.2 2023-11-28 02:36:27,569 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.27 vs. limit=22.5 2023-11-28 02:36:35,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3314446.6666666665, ans=0.125 2023-11-28 02:36:35,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3314446.6666666665, ans=0.0 2023-11-28 02:36:37,310 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.681e+01 8.734e+01 9.392e+01 9.837e+01 1.224e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 02:36:41,685 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4200, loss[loss=0.07282, simple_loss=0.105, pruned_loss=0.01376, audio_tagging_loss=0.006564, over 16504.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.0909, pruned_loss=0.01237, audio_tagging_loss=0.008684, over 3055779.73 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:36:53,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3314580.0, ans=0.125 2023-11-28 02:37:06,189 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497200 2023-11-28 02:37:09,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3314646.6666666665, ans=0.5 2023-11-28 02:37:15,342 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=15.0 2023-11-28 02:37:16,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3314713.3333333335, ans=0.125 2023-11-28 02:37:20,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3314713.3333333335, ans=0.2 2023-11-28 02:37:24,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3314713.3333333335, ans=0.125 2023-11-28 02:37:40,158 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4250, loss[loss=0.06806, simple_loss=0.09371, pruned_loss=0.0105, audio_tagging_loss=0.01071, over 14423.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.0909, pruned_loss=0.01252, audio_tagging_loss=0.008643, over 3060802.52 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:37:45,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3314846.6666666665, ans=0.0 2023-11-28 02:37:48,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3314846.6666666665, ans=0.2 2023-11-28 02:37:59,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3314913.3333333335, ans=0.1 2023-11-28 02:38:04,136 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497250 2023-11-28 02:38:32,542 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.722e+01 9.477e+01 1.017e+02 1.335e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 02:38:36,938 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4300, loss[loss=0.06312, simple_loss=0.08213, pruned_loss=0.01321, audio_tagging_loss=0.00885, over 15367.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09143, pruned_loss=0.01249, audio_tagging_loss=0.008563, over 3061020.82 frames. ], batch size: 63, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:38:44,151 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-28 02:38:45,165 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.21 vs. limit=15.0 2023-11-28 02:38:55,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3315246.6666666665, ans=0.0 2023-11-28 02:39:01,081 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497300 2023-11-28 02:39:05,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3315313.3333333335, ans=0.125 2023-11-28 02:39:06,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=15.0 2023-11-28 02:39:20,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3315380.0, ans=0.0 2023-11-28 02:39:26,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3315446.6666666665, ans=0.125 2023-11-28 02:39:33,981 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4350, loss[loss=0.0462, simple_loss=0.05848, pruned_loss=0.008553, audio_tagging_loss=0.008413, over 15368.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09226, pruned_loss=0.01264, audio_tagging_loss=0.008546, over 3050010.56 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:39:56,450 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=12.0 2023-11-28 02:39:58,406 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497350 2023-11-28 02:40:17,765 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:40:26,197 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.672e+01 8.957e+01 9.552e+01 1.043e+02 1.269e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 02:40:31,064 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4400, loss[loss=0.07747, simple_loss=0.1091, pruned_loss=0.01727, audio_tagging_loss=0.005627, over 16105.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09229, pruned_loss=0.01276, audio_tagging_loss=0.008423, over 3052766.28 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:40:37,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3315846.6666666665, ans=0.0 2023-11-28 02:40:55,878 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497400 2023-11-28 02:41:01,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3315980.0, ans=0.1 2023-11-28 02:41:16,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.75 vs. limit=10.0 2023-11-28 02:41:29,273 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4450, loss[loss=0.0563, simple_loss=0.07141, pruned_loss=0.01017, audio_tagging_loss=0.01043, over 14041.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09157, pruned_loss=0.01272, audio_tagging_loss=0.008497, over 3047605.49 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:41:51,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3316313.3333333335, ans=0.0 2023-11-28 02:41:53,488 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497450 2023-11-28 02:42:16,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3316446.6666666665, ans=0.0 2023-11-28 02:42:22,842 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 9.078e+01 9.731e+01 1.036e+02 1.394e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-28 02:42:27,230 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4500, loss[loss=0.04869, simple_loss=0.06082, pruned_loss=0.009396, audio_tagging_loss=0.008889, over 15174.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09138, pruned_loss=0.01275, audio_tagging_loss=0.008557, over 3050364.23 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:42:50,805 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497500 2023-11-28 02:43:07,001 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.47 vs. limit=22.5 2023-11-28 02:43:20,169 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.85 vs. limit=15.0 2023-11-28 02:43:24,687 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4550, loss[loss=0.06496, simple_loss=0.08961, pruned_loss=0.01136, audio_tagging_loss=0.008798, over 16207.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.0902, pruned_loss=0.01254, audio_tagging_loss=0.008669, over 3046681.37 frames. ], batch size: 64, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:43:30,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3316846.6666666665, ans=0.1 2023-11-28 02:43:32,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3316846.6666666665, ans=0.125 2023-11-28 02:43:38,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3316913.3333333335, ans=0.0 2023-11-28 02:43:49,326 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497550 2023-11-28 02:43:53,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3316980.0, ans=0.0 2023-11-28 02:44:04,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3317046.6666666665, ans=0.5 2023-11-28 02:44:09,507 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:44:12,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3317113.3333333335, ans=0.0 2023-11-28 02:44:16,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=15.0 2023-11-28 02:44:18,112 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.427e+01 8.674e+01 9.170e+01 9.991e+01 1.281e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-28 02:44:21,534 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4600, loss[loss=0.06268, simple_loss=0.08797, pruned_loss=0.011, audio_tagging_loss=0.007699, over 15402.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09045, pruned_loss=0.01252, audio_tagging_loss=0.008617, over 3053869.63 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:44:43,617 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=15.0 2023-11-28 02:44:46,333 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497600 2023-11-28 02:45:03,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3317380.0, ans=0.04949747468305833 2023-11-28 02:45:16,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3317446.6666666665, ans=0.125 2023-11-28 02:45:20,474 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4650, loss[loss=0.05928, simple_loss=0.08311, pruned_loss=0.009101, audio_tagging_loss=0.00862, over 15839.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08977, pruned_loss=0.01252, audio_tagging_loss=0.008741, over 3048940.35 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:45:44,332 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497650 2023-11-28 02:46:14,345 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 8.773e+01 9.249e+01 1.003e+02 1.204e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-28 02:46:17,629 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4700, loss[loss=0.07223, simple_loss=0.09886, pruned_loss=0.01467, audio_tagging_loss=0.008132, over 16006.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09032, pruned_loss=0.01261, audio_tagging_loss=0.008745, over 3047750.58 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:46:18,116 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.22 vs. limit=10.0 2023-11-28 02:46:25,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3317846.6666666665, ans=0.125 2023-11-28 02:46:35,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3317913.3333333335, ans=0.125 2023-11-28 02:46:38,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3317913.3333333335, ans=0.025 2023-11-28 02:46:42,396 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497700 2023-11-28 02:47:14,944 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4750, loss[loss=0.06035, simple_loss=0.08106, pruned_loss=0.009295, audio_tagging_loss=0.01052, over 14806.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08932, pruned_loss=0.01239, audio_tagging_loss=0.008782, over 3051854.89 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:47:15,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3318180.0, ans=0.0 2023-11-28 02:47:15,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3318180.0, ans=0.0 2023-11-28 02:47:20,118 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.91 vs. limit=15.0 2023-11-28 02:47:27,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3318246.6666666665, ans=0.1 2023-11-28 02:47:29,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3318246.6666666665, ans=0.125 2023-11-28 02:47:33,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3318246.6666666665, ans=0.09899494936611666 2023-11-28 02:47:39,388 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497750 2023-11-28 02:47:42,179 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=15.0 2023-11-28 02:47:56,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3318380.0, ans=0.125 2023-11-28 02:48:01,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3318446.6666666665, ans=0.125 2023-11-28 02:48:08,872 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.623e+01 8.846e+01 9.343e+01 1.002e+02 1.233e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 02:48:13,299 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4800, loss[loss=0.07546, simple_loss=0.1094, pruned_loss=0.01441, audio_tagging_loss=0.006352, over 14518.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08906, pruned_loss=0.0123, audio_tagging_loss=0.008886, over 3047022.06 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:48:23,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3318580.0, ans=0.0 2023-11-28 02:48:24,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3318580.0, ans=0.125 2023-11-28 02:48:37,211 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497800 2023-11-28 02:48:41,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3318646.6666666665, ans=0.1 2023-11-28 02:48:53,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3318713.3333333335, ans=0.0 2023-11-28 02:48:56,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3318713.3333333335, ans=0.0 2023-11-28 02:49:09,030 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.98 vs. limit=10.0 2023-11-28 02:49:10,419 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4850, loss[loss=0.06333, simple_loss=0.08882, pruned_loss=0.01079, audio_tagging_loss=0.008135, over 14771.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08939, pruned_loss=0.01238, audio_tagging_loss=0.008936, over 3037682.43 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:49:32,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3318980.0, ans=0.125 2023-11-28 02:49:34,193 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497850 2023-11-28 02:49:41,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.70 vs. limit=6.0 2023-11-28 02:49:45,051 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.94 vs. limit=15.0 2023-11-28 02:49:54,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3319046.6666666665, ans=0.09899494936611666 2023-11-28 02:50:02,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3319113.3333333335, ans=0.125 2023-11-28 02:50:05,856 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.086e+01 8.681e+01 9.347e+01 1.000e+02 1.245e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 02:50:06,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3319113.3333333335, ans=0.04949747468305833 2023-11-28 02:50:08,187 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4900, loss[loss=0.05504, simple_loss=0.07714, pruned_loss=0.009168, audio_tagging_loss=0.007303, over 14216.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08873, pruned_loss=0.01224, audio_tagging_loss=0.009028, over 3041562.26 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:50:17,809 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.18 vs. limit=10.0 2023-11-28 02:50:25,883 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.33 vs. limit=12.0 2023-11-28 02:50:33,049 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497900 2023-11-28 02:51:01,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3319446.6666666665, ans=0.125 2023-11-28 02:51:05,917 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4950, loss[loss=0.0643, simple_loss=0.08685, pruned_loss=0.01141, audio_tagging_loss=0.009467, over 15547.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.0897, pruned_loss=0.01232, audio_tagging_loss=0.008912, over 3038614.52 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:51:08,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3319513.3333333335, ans=0.0 2023-11-28 02:51:25,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3319580.0, ans=0.125 2023-11-28 02:51:29,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3319646.6666666665, ans=0.1 2023-11-28 02:51:30,996 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497950 2023-11-28 02:51:50,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3319713.3333333335, ans=0.125 2023-11-28 02:52:01,892 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.571e+01 9.206e+01 9.727e+01 1.276e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-28 02:52:04,063 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5000, loss[loss=0.08242, simple_loss=0.1164, pruned_loss=0.0169, audio_tagging_loss=0.007323, over 14750.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08934, pruned_loss=0.01227, audio_tagging_loss=0.00871, over 3036526.58 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:52:27,529 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498000 2023-11-28 02:52:57,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3320113.3333333335, ans=0.125 2023-11-28 02:53:00,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3320180.0, ans=0.2 2023-11-28 02:53:01,680 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5050, loss[loss=0.0732, simple_loss=0.1029, pruned_loss=0.01451, audio_tagging_loss=0.007256, over 15756.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08986, pruned_loss=0.0124, audio_tagging_loss=0.008594, over 3033223.13 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:53:01,909 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:53:08,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3320180.0, ans=0.95 2023-11-28 02:53:10,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3320180.0, ans=0.125 2023-11-28 02:53:16,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3320246.6666666665, ans=0.125 2023-11-28 02:53:19,588 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=15.0 2023-11-28 02:53:25,471 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498050 2023-11-28 02:53:26,113 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2023-11-28 02:53:34,170 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2023-11-28 02:53:40,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3320380.0, ans=0.2 2023-11-28 02:53:56,348 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.913e+01 8.785e+01 9.412e+01 9.952e+01 1.191e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 02:53:58,589 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5100, loss[loss=0.07042, simple_loss=0.1015, pruned_loss=0.01128, audio_tagging_loss=0.008395, over 15774.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08966, pruned_loss=0.01236, audio_tagging_loss=0.008572, over 3035268.31 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:54:01,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3320513.3333333335, ans=0.07 2023-11-28 02:54:20,589 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.72 vs. limit=22.5 2023-11-28 02:54:23,939 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498100 2023-11-28 02:54:56,892 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5150, loss[loss=0.0754, simple_loss=0.111, pruned_loss=0.01258, audio_tagging_loss=0.007315, over 15541.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09048, pruned_loss=0.01243, audio_tagging_loss=0.008559, over 3038565.49 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:55:18,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3320913.3333333335, ans=0.2 2023-11-28 02:55:21,138 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498150 2023-11-28 02:55:22,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3320980.0, ans=0.125 2023-11-28 02:55:53,311 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.730e+01 8.740e+01 9.410e+01 1.002e+02 1.466e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 02:55:54,469 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5200, loss[loss=0.07218, simple_loss=0.09918, pruned_loss=0.01329, audio_tagging_loss=0.009301, over 15537.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09044, pruned_loss=0.01251, audio_tagging_loss=0.008634, over 3040750.65 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:56:13,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3321246.6666666665, ans=0.09899494936611666 2023-11-28 02:56:18,559 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498200 2023-11-28 02:56:21,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3321313.3333333335, ans=0.0 2023-11-28 02:56:28,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3321380.0, ans=0.05 2023-11-28 02:56:31,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3321380.0, ans=0.125 2023-11-28 02:56:40,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3321446.6666666665, ans=0.125 2023-11-28 02:56:42,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3321446.6666666665, ans=0.125 2023-11-28 02:56:51,816 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5250, loss[loss=0.09021, simple_loss=0.1243, pruned_loss=0.01847, audio_tagging_loss=0.009594, over 15961.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08967, pruned_loss=0.01248, audio_tagging_loss=0.008722, over 3037503.62 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:57:08,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3321580.0, ans=0.0 2023-11-28 02:57:13,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3321580.0, ans=0.07 2023-11-28 02:57:14,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3321646.6666666665, ans=0.0 2023-11-28 02:57:16,181 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498250 2023-11-28 02:57:17,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3321646.6666666665, ans=0.0 2023-11-28 02:57:18,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3321646.6666666665, ans=0.125 2023-11-28 02:57:19,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3321646.6666666665, ans=0.125 2023-11-28 02:57:23,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3321646.6666666665, ans=0.125 2023-11-28 02:57:48,324 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.650e+01 8.904e+01 9.487e+01 1.032e+02 1.355e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 02:57:49,441 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5300, loss[loss=0.05326, simple_loss=0.07193, pruned_loss=0.009384, audio_tagging_loss=0.007909, over 15542.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08907, pruned_loss=0.01246, audio_tagging_loss=0.008774, over 3036002.05 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:58:02,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.60 vs. limit=15.0 2023-11-28 02:58:04,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3321913.3333333335, ans=0.125 2023-11-28 02:58:13,558 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498300 2023-11-28 02:58:14,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3321980.0, ans=0.0 2023-11-28 02:58:15,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3321980.0, ans=0.04949747468305833 2023-11-28 02:58:39,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3322113.3333333335, ans=0.0 2023-11-28 02:58:41,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3322113.3333333335, ans=0.0 2023-11-28 02:58:47,136 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5350, loss[loss=0.05437, simple_loss=0.06862, pruned_loss=0.008741, audio_tagging_loss=0.01131, over 16008.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.088, pruned_loss=0.01226, audio_tagging_loss=0.008868, over 3039898.12 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:58:47,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3322180.0, ans=0.0 2023-11-28 02:59:11,090 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498350 2023-11-28 02:59:16,237 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2023-11-28 02:59:34,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3322446.6666666665, ans=0.125 2023-11-28 02:59:36,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3322446.6666666665, ans=0.2 2023-11-28 02:59:42,878 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.664e+01 9.180e+01 9.721e+01 1.287e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-28 02:59:44,017 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5400, loss[loss=0.07097, simple_loss=0.08832, pruned_loss=0.01444, audio_tagging_loss=0.01236, over 14946.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08931, pruned_loss=0.01236, audio_tagging_loss=0.008798, over 3041186.94 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:59:47,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3322513.3333333335, ans=0.125 2023-11-28 03:00:07,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3322646.6666666665, ans=0.125 2023-11-28 03:00:08,052 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498400 2023-11-28 03:00:17,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3322713.3333333335, ans=0.125 2023-11-28 03:00:40,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3322780.0, ans=0.125 2023-11-28 03:00:40,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3322780.0, ans=0.0 2023-11-28 03:00:42,006 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5450, loss[loss=0.07825, simple_loss=0.1135, pruned_loss=0.01524, audio_tagging_loss=0.006289, over 15537.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08992, pruned_loss=0.0125, audio_tagging_loss=0.008714, over 3046203.31 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:00:52,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3322913.3333333335, ans=0.0 2023-11-28 03:00:59,262 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:01:01,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3322913.3333333335, ans=0.125 2023-11-28 03:01:06,765 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498450 2023-11-28 03:01:29,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3323113.3333333335, ans=0.2 2023-11-28 03:01:29,903 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:01:38,406 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.263e+01 8.881e+01 9.599e+01 1.024e+02 1.269e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 03:01:39,531 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5500, loss[loss=0.06408, simple_loss=0.08141, pruned_loss=0.01202, audio_tagging_loss=0.01136, over 14655.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08878, pruned_loss=0.01229, audio_tagging_loss=0.008898, over 3038922.60 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:01:45,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3323180.0, ans=0.0 2023-11-28 03:02:04,124 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498500 2023-11-28 03:02:04,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3323313.3333333335, ans=0.2 2023-11-28 03:02:10,586 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2023-11-28 03:02:17,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=3323380.0, ans=15.0 2023-11-28 03:02:29,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3323446.6666666665, ans=0.0 2023-11-28 03:02:37,295 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5550, loss[loss=0.05555, simple_loss=0.07701, pruned_loss=0.007724, audio_tagging_loss=0.00932, over 15836.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08953, pruned_loss=0.0124, audio_tagging_loss=0.008894, over 3035357.56 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:02:56,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3323580.0, ans=0.0 2023-11-28 03:02:57,479 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.42 vs. limit=10.0 2023-11-28 03:03:01,174 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498550 2023-11-28 03:03:21,376 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.50 vs. limit=10.0 2023-11-28 03:03:25,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3323780.0, ans=0.0 2023-11-28 03:03:33,947 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.494e+01 8.559e+01 9.219e+01 9.829e+01 1.565e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-28 03:03:35,071 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5600, loss[loss=0.06928, simple_loss=0.08827, pruned_loss=0.01527, audio_tagging_loss=0.00987, over 14662.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09005, pruned_loss=0.0124, audio_tagging_loss=0.008963, over 3044840.22 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:03:42,001 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:03:46,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3323913.3333333335, ans=0.125 2023-11-28 03:03:46,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3323913.3333333335, ans=0.125 2023-11-28 03:03:59,253 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498600 2023-11-28 03:04:01,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3323980.0, ans=0.125 2023-11-28 03:04:01,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3323980.0, ans=0.125 2023-11-28 03:04:12,916 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.49 vs. limit=10.0 2023-11-28 03:04:17,627 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 03:04:21,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3324113.3333333335, ans=0.1 2023-11-28 03:04:31,807 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5650, loss[loss=0.0451, simple_loss=0.05266, pruned_loss=0.008496, audio_tagging_loss=0.01027, over 14615.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09004, pruned_loss=0.01235, audio_tagging_loss=0.009034, over 3045766.47 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:04:37,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3324180.0, ans=0.2 2023-11-28 03:04:44,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3324246.6666666665, ans=0.125 2023-11-28 03:04:55,887 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498650 2023-11-28 03:05:03,939 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.42 vs. limit=12.0 2023-11-28 03:05:13,454 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:05:21,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3324446.6666666665, ans=0.125 2023-11-28 03:05:27,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3324446.6666666665, ans=0.0 2023-11-28 03:05:28,000 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:05:28,663 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 8.782e+01 9.473e+01 1.042e+02 1.222e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 03:05:29,873 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5700, loss[loss=0.06692, simple_loss=0.0929, pruned_loss=0.01161, audio_tagging_loss=0.008862, over 15920.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09049, pruned_loss=0.01225, audio_tagging_loss=0.008967, over 3052607.53 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:05:53,891 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498700 2023-11-28 03:05:57,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3324646.6666666665, ans=0.125 2023-11-28 03:06:07,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3324713.3333333335, ans=0.125 2023-11-28 03:06:18,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3324780.0, ans=0.0 2023-11-28 03:06:18,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3324780.0, ans=0.0 2023-11-28 03:06:18,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3324780.0, ans=0.125 2023-11-28 03:06:27,546 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5750, loss[loss=0.06308, simple_loss=0.08648, pruned_loss=0.01113, audio_tagging_loss=0.008708, over 15996.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08901, pruned_loss=0.01197, audio_tagging_loss=0.008872, over 3051933.82 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:06:40,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3324913.3333333335, ans=0.0 2023-11-28 03:06:42,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3324913.3333333335, ans=0.125 2023-11-28 03:06:47,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3324913.3333333335, ans=0.125 2023-11-28 03:06:51,008 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498750 2023-11-28 03:07:03,004 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.52 vs. limit=15.0 2023-11-28 03:07:22,755 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.740e+01 9.291e+01 9.936e+01 1.231e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-28 03:07:23,851 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5800, loss[loss=0.06386, simple_loss=0.08563, pruned_loss=0.01178, audio_tagging_loss=0.009259, over 16027.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08924, pruned_loss=0.01205, audio_tagging_loss=0.008807, over 3046048.59 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:07:38,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3325246.6666666665, ans=0.0 2023-11-28 03:07:41,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3325246.6666666665, ans=0.125 2023-11-28 03:07:48,079 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498800 2023-11-28 03:07:57,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3325313.3333333335, ans=0.125 2023-11-28 03:08:03,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3325380.0, ans=0.0 2023-11-28 03:08:12,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3325446.6666666665, ans=0.125 2023-11-28 03:08:12,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3325446.6666666665, ans=0.1 2023-11-28 03:08:21,652 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5850, loss[loss=0.07076, simple_loss=0.08938, pruned_loss=0.01466, audio_tagging_loss=0.01142, over 14846.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09025, pruned_loss=0.0123, audio_tagging_loss=0.008681, over 3040729.41 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:08:26,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3325513.3333333335, ans=0.125 2023-11-28 03:08:46,178 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498850 2023-11-28 03:09:13,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3325780.0, ans=0.125 2023-11-28 03:09:15,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3325780.0, ans=0.0 2023-11-28 03:09:16,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3325780.0, ans=0.025 2023-11-28 03:09:17,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3325780.0, ans=0.125 2023-11-28 03:09:18,075 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.216e+01 8.725e+01 9.320e+01 1.016e+02 1.515e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-28 03:09:18,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3325846.6666666665, ans=0.0 2023-11-28 03:09:19,659 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5900, loss[loss=0.06566, simple_loss=0.09489, pruned_loss=0.01123, audio_tagging_loss=0.006981, over 14485.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09071, pruned_loss=0.01233, audio_tagging_loss=0.008623, over 3036831.60 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:09:35,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3325913.3333333335, ans=0.1 2023-11-28 03:09:36,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3325913.3333333335, ans=0.0 2023-11-28 03:09:40,408 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=15.0 2023-11-28 03:09:43,950 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498900 2023-11-28 03:09:49,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3325980.0, ans=0.125 2023-11-28 03:09:51,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3325980.0, ans=22.5 2023-11-28 03:09:56,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3326046.6666666665, ans=0.2 2023-11-28 03:10:03,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3326046.6666666665, ans=0.1 2023-11-28 03:10:06,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3326113.3333333335, ans=0.125 2023-11-28 03:10:14,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3326113.3333333335, ans=0.0 2023-11-28 03:10:16,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3326180.0, ans=0.5 2023-11-28 03:10:17,157 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5950, loss[loss=0.06059, simple_loss=0.08435, pruned_loss=0.007802, audio_tagging_loss=0.01062, over 16269.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09019, pruned_loss=0.01237, audio_tagging_loss=0.00864, over 3040938.62 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:10:40,942 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498950 2023-11-28 03:10:51,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3326380.0, ans=0.125 2023-11-28 03:11:01,500 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.05 vs. limit=15.0 2023-11-28 03:11:01,741 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.31 vs. limit=22.5 2023-11-28 03:11:04,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3326446.6666666665, ans=0.125 2023-11-28 03:11:14,372 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 8.682e+01 9.363e+01 1.001e+02 1.313e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 03:11:14,398 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6000, loss[loss=0.06189, simple_loss=0.0809, pruned_loss=0.01105, audio_tagging_loss=0.01039, over 15573.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09082, pruned_loss=0.01255, audio_tagging_loss=0.008589, over 3044186.85 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:11:14,400 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 03:11:49,938 INFO [train_asr.py:1267] (0/4) Epoch 42, validation: loss=0.05789, simple_loss=0.05056, pruned_loss=0.005172, audio_tagging_loss=0.02743, over 4681554.00 frames. 2023-11-28 03:11:49,939 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 03:11:53,560 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:12:03,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3326580.0, ans=0.1 2023-11-28 03:12:13,487 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499000 2023-11-28 03:12:17,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3326646.6666666665, ans=0.125 2023-11-28 03:12:21,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3326646.6666666665, ans=0.125 2023-11-28 03:12:23,074 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.88 vs. limit=15.0 2023-11-28 03:12:32,255 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 03:12:38,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3326780.0, ans=0.0 2023-11-28 03:12:43,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3326780.0, ans=0.125 2023-11-28 03:12:46,918 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6050, loss[loss=0.06477, simple_loss=0.09198, pruned_loss=0.01127, audio_tagging_loss=0.007507, over 15226.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08938, pruned_loss=0.01245, audio_tagging_loss=0.00873, over 3041285.78 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:12:50,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3326846.6666666665, ans=0.0 2023-11-28 03:12:55,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3326846.6666666665, ans=0.125 2023-11-28 03:13:10,394 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499050 2023-11-28 03:13:14,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3326980.0, ans=0.0 2023-11-28 03:13:14,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3326980.0, ans=0.125 2023-11-28 03:13:23,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3327046.6666666665, ans=0.125 2023-11-28 03:13:25,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3327046.6666666665, ans=0.0 2023-11-28 03:13:41,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3327113.3333333335, ans=0.0 2023-11-28 03:13:43,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3327180.0, ans=0.0 2023-11-28 03:13:44,241 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.818e+01 9.290e+01 9.982e+01 1.282e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-28 03:13:44,267 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6100, loss[loss=0.08083, simple_loss=0.1197, pruned_loss=0.01385, audio_tagging_loss=0.007141, over 16977.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08987, pruned_loss=0.01235, audio_tagging_loss=0.008652, over 3042912.68 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:14:07,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3327313.3333333335, ans=0.125 2023-11-28 03:14:08,815 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499100 2023-11-28 03:14:16,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3327313.3333333335, ans=0.125 2023-11-28 03:14:41,475 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6150, loss[loss=0.06541, simple_loss=0.09305, pruned_loss=0.01385, audio_tagging_loss=0.005029, over 14457.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08962, pruned_loss=0.01236, audio_tagging_loss=0.008687, over 3044458.74 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:15:01,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3327580.0, ans=0.125 2023-11-28 03:15:05,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3327646.6666666665, ans=0.125 2023-11-28 03:15:06,135 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499150 2023-11-28 03:15:23,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3327713.3333333335, ans=0.125 2023-11-28 03:15:39,214 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6200, loss[loss=0.08046, simple_loss=0.1196, pruned_loss=0.01521, audio_tagging_loss=0.005442, over 15996.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08935, pruned_loss=0.01227, audio_tagging_loss=0.008737, over 3042692.18 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:15:40,272 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.163e+01 8.658e+01 9.318e+01 1.003e+02 1.390e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-28 03:15:40,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3327846.6666666665, ans=0.125 2023-11-28 03:15:41,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3327846.6666666665, ans=0.0 2023-11-28 03:15:51,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3327913.3333333335, ans=0.1 2023-11-28 03:15:56,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3327913.3333333335, ans=0.125 2023-11-28 03:16:02,954 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499200 2023-11-28 03:16:10,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3327980.0, ans=0.2 2023-11-28 03:16:20,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3328046.6666666665, ans=0.5 2023-11-28 03:16:36,594 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6250, loss[loss=0.07213, simple_loss=0.092, pruned_loss=0.0157, audio_tagging_loss=0.01044, over 15899.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08967, pruned_loss=0.01236, audio_tagging_loss=0.008863, over 3053761.04 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:17:00,548 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499250 2023-11-28 03:17:13,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3328380.0, ans=0.2 2023-11-28 03:17:18,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3328380.0, ans=0.0 2023-11-28 03:17:18,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3328380.0, ans=0.0 2023-11-28 03:17:25,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.09 vs. limit=22.5 2023-11-28 03:17:33,311 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6300, loss[loss=0.06618, simple_loss=0.0889, pruned_loss=0.01224, audio_tagging_loss=0.009491, over 15224.00 frames. ], tot_loss[loss=0.066, simple_loss=0.0893, pruned_loss=0.01238, audio_tagging_loss=0.008964, over 3053884.38 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:17:34,342 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.443e+01 9.160e+01 9.772e+01 1.060e+02 1.350e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-28 03:17:38,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3328513.3333333335, ans=0.0 2023-11-28 03:17:42,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3328513.3333333335, ans=0.125 2023-11-28 03:17:58,600 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499300 2023-11-28 03:17:59,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3328646.6666666665, ans=0.1 2023-11-28 03:18:11,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3328713.3333333335, ans=0.125 2023-11-28 03:18:15,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3328713.3333333335, ans=0.0 2023-11-28 03:18:31,042 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6350, loss[loss=0.07421, simple_loss=0.1017, pruned_loss=0.01414, audio_tagging_loss=0.009236, over 15985.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08978, pruned_loss=0.01236, audio_tagging_loss=0.008964, over 3048412.13 frames. ], batch size: 63, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:18:35,447 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.44 vs. limit=22.5 2023-11-28 03:18:39,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3328846.6666666665, ans=0.0 2023-11-28 03:18:39,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3328846.6666666665, ans=0.125 2023-11-28 03:18:48,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3328913.3333333335, ans=0.125 2023-11-28 03:18:55,224 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499350 2023-11-28 03:18:58,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=22.5 2023-11-28 03:19:00,204 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=22.5 2023-11-28 03:19:03,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3328980.0, ans=0.0 2023-11-28 03:19:12,092 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.10 vs. limit=15.0 2023-11-28 03:19:25,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3329113.3333333335, ans=0.125 2023-11-28 03:19:29,097 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6400, loss[loss=0.0699, simple_loss=0.09641, pruned_loss=0.01428, audio_tagging_loss=0.007418, over 14586.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09008, pruned_loss=0.01245, audio_tagging_loss=0.008969, over 3038636.17 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:19:30,177 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 8.920e+01 9.509e+01 1.018e+02 1.569e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 03:19:48,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3329246.6666666665, ans=0.2 2023-11-28 03:19:49,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3329246.6666666665, ans=0.125 2023-11-28 03:19:52,944 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499400 2023-11-28 03:20:06,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3329380.0, ans=0.1 2023-11-28 03:20:06,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3329380.0, ans=0.5 2023-11-28 03:20:26,016 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6450, loss[loss=0.06005, simple_loss=0.08028, pruned_loss=0.01041, audio_tagging_loss=0.009497, over 14249.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09126, pruned_loss=0.01256, audio_tagging_loss=0.009023, over 3036059.02 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:20:35,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3329513.3333333335, ans=0.125 2023-11-28 03:20:49,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3329646.6666666665, ans=0.125 2023-11-28 03:20:49,881 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499450 2023-11-28 03:20:56,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3329646.6666666665, ans=0.05 2023-11-28 03:21:01,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3329713.3333333335, ans=0.1 2023-11-28 03:21:07,717 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.97 vs. limit=15.0 2023-11-28 03:21:09,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.95 vs. limit=15.0 2023-11-28 03:21:23,080 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6500, loss[loss=0.0476, simple_loss=0.06077, pruned_loss=0.006865, audio_tagging_loss=0.01035, over 14526.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09, pruned_loss=0.01227, audio_tagging_loss=0.009121, over 3032115.42 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:21:24,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3329846.6666666665, ans=0.1 2023-11-28 03:21:25,263 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.737e+01 9.352e+01 9.973e+01 1.217e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-28 03:21:39,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3329913.3333333335, ans=0.0 2023-11-28 03:21:40,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3329913.3333333335, ans=0.0 2023-11-28 03:21:47,129 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499500 2023-11-28 03:22:16,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3330113.3333333335, ans=0.015 2023-11-28 03:22:20,278 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6550, loss[loss=0.06867, simple_loss=0.08912, pruned_loss=0.01479, audio_tagging_loss=0.009317, over 15584.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.0898, pruned_loss=0.01223, audio_tagging_loss=0.008852, over 3033439.54 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:22:36,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3330246.6666666665, ans=0.125 2023-11-28 03:22:38,780 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.24 vs. limit=22.5 2023-11-28 03:22:44,233 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499550 2023-11-28 03:22:45,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3330313.3333333335, ans=0.1 2023-11-28 03:22:52,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3330313.3333333335, ans=0.1 2023-11-28 03:22:59,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3330380.0, ans=0.125 2023-11-28 03:23:04,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3330446.6666666665, ans=0.125 2023-11-28 03:23:09,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3330446.6666666665, ans=0.09899494936611666 2023-11-28 03:23:12,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3330446.6666666665, ans=0.1 2023-11-28 03:23:16,558 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6600, loss[loss=0.07465, simple_loss=0.1058, pruned_loss=0.01456, audio_tagging_loss=0.007175, over 14825.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08983, pruned_loss=0.01234, audio_tagging_loss=0.008788, over 3036544.39 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:23:19,846 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.105e+01 8.958e+01 9.376e+01 9.845e+01 1.305e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-28 03:23:26,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3330513.3333333335, ans=0.025 2023-11-28 03:23:35,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3330580.0, ans=0.1 2023-11-28 03:23:37,586 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2023-11-28 03:23:40,516 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499600 2023-11-28 03:23:53,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3330713.3333333335, ans=0.1 2023-11-28 03:24:00,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3330713.3333333335, ans=0.1 2023-11-28 03:24:01,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3330780.0, ans=0.07 2023-11-28 03:24:10,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3330780.0, ans=0.1 2023-11-28 03:24:14,474 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6650, loss[loss=0.07995, simple_loss=0.1117, pruned_loss=0.01553, audio_tagging_loss=0.008559, over 15568.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.0899, pruned_loss=0.01232, audio_tagging_loss=0.008768, over 3037816.92 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:24:38,531 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499650 2023-11-28 03:24:39,105 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=15.0 2023-11-28 03:24:59,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3331113.3333333335, ans=0.125 2023-11-28 03:25:10,989 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6700, loss[loss=0.07196, simple_loss=0.1013, pruned_loss=0.01302, audio_tagging_loss=0.008272, over 14923.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09013, pruned_loss=0.01232, audio_tagging_loss=0.008667, over 3038833.63 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:25:12,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3331180.0, ans=0.0 2023-11-28 03:25:14,826 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.118e+01 8.626e+01 9.557e+01 1.018e+02 1.449e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 03:25:21,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3331180.0, ans=0.0 2023-11-28 03:25:32,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3331246.6666666665, ans=0.1 2023-11-28 03:25:36,089 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499700 2023-11-28 03:25:36,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3331313.3333333335, ans=0.125 2023-11-28 03:26:08,892 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6750, loss[loss=0.05527, simple_loss=0.07309, pruned_loss=0.01023, audio_tagging_loss=0.008502, over 16068.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08984, pruned_loss=0.01239, audio_tagging_loss=0.008736, over 3044569.60 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:26:24,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3331580.0, ans=0.0 2023-11-28 03:26:32,876 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499750 2023-11-28 03:26:35,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3331646.6666666665, ans=0.0 2023-11-28 03:26:43,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3331713.3333333335, ans=0.1 2023-11-28 03:26:45,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3331713.3333333335, ans=0.05 2023-11-28 03:26:47,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3331713.3333333335, ans=0.125 2023-11-28 03:27:00,678 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.59 vs. limit=10.0 2023-11-28 03:27:00,824 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.84 vs. limit=22.5 2023-11-28 03:27:03,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3331780.0, ans=0.125 2023-11-28 03:27:06,686 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6800, loss[loss=0.0598, simple_loss=0.07958, pruned_loss=0.01211, audio_tagging_loss=0.007906, over 14277.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08997, pruned_loss=0.01237, audio_tagging_loss=0.008639, over 3036279.78 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:27:10,005 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.683e+01 9.159e+01 9.907e+01 1.833e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-28 03:27:13,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3331846.6666666665, ans=0.2 2023-11-28 03:27:22,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3331913.3333333335, ans=0.1 2023-11-28 03:27:28,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3331980.0, ans=0.125 2023-11-28 03:27:28,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3331980.0, ans=0.0 2023-11-28 03:27:30,292 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499800 2023-11-28 03:27:37,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3331980.0, ans=0.125 2023-11-28 03:27:53,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3332113.3333333335, ans=0.09899494936611666 2023-11-28 03:28:03,817 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6850, loss[loss=0.0773, simple_loss=0.1162, pruned_loss=0.01408, audio_tagging_loss=0.005118, over 14025.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09073, pruned_loss=0.01251, audio_tagging_loss=0.008568, over 3033461.65 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:28:20,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3332246.6666666665, ans=0.125 2023-11-28 03:28:28,029 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499850 2023-11-28 03:28:41,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3332380.0, ans=0.0 2023-11-28 03:28:46,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3332380.0, ans=0.125 2023-11-28 03:28:47,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3332380.0, ans=0.125 2023-11-28 03:28:48,946 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.70 vs. limit=15.0 2023-11-28 03:29:01,369 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6900, loss[loss=0.05187, simple_loss=0.06535, pruned_loss=0.009821, audio_tagging_loss=0.00937, over 15106.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09098, pruned_loss=0.01252, audio_tagging_loss=0.008528, over 3037474.49 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:29:01,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3332513.3333333335, ans=0.125 2023-11-28 03:29:02,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3332513.3333333335, ans=0.05 2023-11-28 03:29:03,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3332513.3333333335, ans=0.1 2023-11-28 03:29:07,519 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.595e+01 9.072e+01 9.849e+01 1.232e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-28 03:29:07,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3332513.3333333335, ans=0.2 2023-11-28 03:29:25,811 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499900 2023-11-28 03:29:40,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3332713.3333333335, ans=0.0 2023-11-28 03:29:47,277 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 03:29:48,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3332780.0, ans=0.125 2023-11-28 03:29:50,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3332780.0, ans=0.125 2023-11-28 03:29:59,741 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6950, loss[loss=0.06966, simple_loss=0.09607, pruned_loss=0.01557, audio_tagging_loss=0.006053, over 13886.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09139, pruned_loss=0.01263, audio_tagging_loss=0.008502, over 3035322.20 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:30:11,079 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.99 vs. limit=15.0 2023-11-28 03:30:14,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3332913.3333333335, ans=0.1 2023-11-28 03:30:20,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3332980.0, ans=0.1 2023-11-28 03:30:23,213 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499950 2023-11-28 03:30:34,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3333046.6666666665, ans=0.2 2023-11-28 03:30:43,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3333046.6666666665, ans=0.125 2023-11-28 03:30:49,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3333113.3333333335, ans=0.125 2023-11-28 03:30:51,388 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.66 vs. limit=15.0 2023-11-28 03:30:56,308 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7000, loss[loss=0.07484, simple_loss=0.1129, pruned_loss=0.01231, audio_tagging_loss=0.006078, over 15532.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09045, pruned_loss=0.0124, audio_tagging_loss=0.008654, over 3034202.52 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:31:01,686 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.564e+01 9.211e+01 9.659e+01 1.272e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-28 03:31:09,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3333246.6666666665, ans=0.1 2023-11-28 03:31:16,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3333246.6666666665, ans=0.125 2023-11-28 03:31:20,448 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500000 2023-11-28 03:31:21,798 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-500000.pt 2023-11-28 03:31:28,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3333313.3333333335, ans=0.0 2023-11-28 03:31:29,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3333313.3333333335, ans=0.0 2023-11-28 03:31:33,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3333380.0, ans=0.125 2023-11-28 03:31:34,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3333380.0, ans=0.2 2023-11-28 03:31:55,591 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7050, loss[loss=0.05515, simple_loss=0.0708, pruned_loss=0.0112, audio_tagging_loss=0.008548, over 14734.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08933, pruned_loss=0.01213, audio_tagging_loss=0.008805, over 3030814.12 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:32:06,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3333580.0, ans=0.125 2023-11-28 03:32:08,766 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.17 vs. limit=22.5 2023-11-28 03:32:09,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3333580.0, ans=0.035 2023-11-28 03:32:09,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3333580.0, ans=0.125 2023-11-28 03:32:13,805 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.63 vs. limit=22.5 2023-11-28 03:32:16,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3333580.0, ans=0.0 2023-11-28 03:32:19,951 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500050 2023-11-28 03:32:26,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3333646.6666666665, ans=0.125 2023-11-28 03:32:48,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3333780.0, ans=0.0 2023-11-28 03:32:52,923 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7100, loss[loss=0.07171, simple_loss=0.1074, pruned_loss=0.01154, audio_tagging_loss=0.006464, over 15106.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.0906, pruned_loss=0.0124, audio_tagging_loss=0.008911, over 3035172.35 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:32:58,781 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.626e+01 8.733e+01 9.408e+01 1.010e+02 1.480e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 03:32:59,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3333846.6666666665, ans=0.125 2023-11-28 03:33:00,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3333846.6666666665, ans=0.125 2023-11-28 03:33:02,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3333846.6666666665, ans=0.0 2023-11-28 03:33:05,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3333913.3333333335, ans=0.125 2023-11-28 03:33:08,155 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=22.5 2023-11-28 03:33:09,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3333913.3333333335, ans=15.0 2023-11-28 03:33:15,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.42 vs. limit=15.0 2023-11-28 03:33:16,495 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500100 2023-11-28 03:33:22,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3333980.0, ans=0.0 2023-11-28 03:33:49,153 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.69 vs. limit=15.0 2023-11-28 03:33:49,688 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7150, loss[loss=0.06458, simple_loss=0.08981, pruned_loss=0.01094, audio_tagging_loss=0.008734, over 15624.00 frames. ], tot_loss[loss=0.06754, simple_loss=0.09207, pruned_loss=0.01267, audio_tagging_loss=0.008833, over 3045818.41 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:33:56,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3334180.0, ans=0.125 2023-11-28 03:34:05,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3334246.6666666665, ans=0.125 2023-11-28 03:34:10,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3334246.6666666665, ans=0.0 2023-11-28 03:34:13,261 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500150 2023-11-28 03:34:25,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3334380.0, ans=0.125 2023-11-28 03:34:25,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3334380.0, ans=0.125 2023-11-28 03:34:34,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3334446.6666666665, ans=0.0 2023-11-28 03:34:38,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3334446.6666666665, ans=10.0 2023-11-28 03:34:40,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3334446.6666666665, ans=0.0 2023-11-28 03:34:40,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3334446.6666666665, ans=0.125 2023-11-28 03:34:41,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3334446.6666666665, ans=0.125 2023-11-28 03:34:43,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3334446.6666666665, ans=0.1 2023-11-28 03:34:46,522 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7200, loss[loss=0.07259, simple_loss=0.0957, pruned_loss=0.01455, audio_tagging_loss=0.0102, over 15938.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09141, pruned_loss=0.01264, audio_tagging_loss=0.008904, over 3043233.26 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:34:51,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3334513.3333333335, ans=0.125 2023-11-28 03:34:51,920 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.895e+01 9.379e+01 1.001e+02 1.500e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 03:35:09,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3334646.6666666665, ans=0.1 2023-11-28 03:35:10,582 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500200 2023-11-28 03:35:16,801 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2023-11-28 03:35:19,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3334713.3333333335, ans=0.125 2023-11-28 03:35:19,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3334713.3333333335, ans=0.125 2023-11-28 03:35:24,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3334713.3333333335, ans=0.125 2023-11-28 03:35:30,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-11-28 03:35:32,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3334780.0, ans=0.125 2023-11-28 03:35:34,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3334780.0, ans=0.0 2023-11-28 03:35:36,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3334780.0, ans=0.125 2023-11-28 03:35:43,149 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7250, loss[loss=0.0532, simple_loss=0.07268, pruned_loss=0.006389, audio_tagging_loss=0.01047, over 14900.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09128, pruned_loss=0.01257, audio_tagging_loss=0.008862, over 3044440.18 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:36:07,250 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500250 2023-11-28 03:36:15,920 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.19 vs. limit=15.0 2023-11-28 03:36:24,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3335046.6666666665, ans=0.125 2023-11-28 03:36:24,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.51 vs. limit=15.0 2023-11-28 03:36:37,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3335113.3333333335, ans=0.0 2023-11-28 03:36:40,979 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7300, loss[loss=0.06244, simple_loss=0.08142, pruned_loss=0.01316, audio_tagging_loss=0.008571, over 15691.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09083, pruned_loss=0.01242, audio_tagging_loss=0.008755, over 3039573.80 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:36:46,357 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.677e+01 9.313e+01 1.019e+02 2.186e+02, threshold=1.863e+02, percent-clipped=1.0 2023-11-28 03:36:46,646 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:36:52,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.30 vs. limit=12.0 2023-11-28 03:36:55,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3335246.6666666665, ans=0.125 2023-11-28 03:36:57,626 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.02 vs. limit=15.0 2023-11-28 03:37:03,090 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.49 vs. limit=15.0 2023-11-28 03:37:04,806 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500300 2023-11-28 03:37:06,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3335313.3333333335, ans=0.05 2023-11-28 03:37:14,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3335380.0, ans=0.125 2023-11-28 03:37:24,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3335380.0, ans=0.95 2023-11-28 03:37:25,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3335380.0, ans=0.125 2023-11-28 03:37:29,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3335446.6666666665, ans=0.1 2023-11-28 03:37:31,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3335446.6666666665, ans=0.1 2023-11-28 03:37:38,146 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7350, loss[loss=0.04357, simple_loss=0.05277, pruned_loss=0.007673, audio_tagging_loss=0.009508, over 15736.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09108, pruned_loss=0.01261, audio_tagging_loss=0.008629, over 3039737.63 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:37:43,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3335513.3333333335, ans=0.1 2023-11-28 03:38:02,902 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500350 2023-11-28 03:38:17,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3335713.3333333335, ans=0.5 2023-11-28 03:38:17,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3335713.3333333335, ans=0.0 2023-11-28 03:38:21,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3335713.3333333335, ans=0.125 2023-11-28 03:38:22,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3335713.3333333335, ans=0.1 2023-11-28 03:38:23,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3335780.0, ans=0.0 2023-11-28 03:38:35,821 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7400, loss[loss=0.06969, simple_loss=0.09314, pruned_loss=0.01328, audio_tagging_loss=0.009851, over 14980.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09112, pruned_loss=0.01264, audio_tagging_loss=0.008614, over 3048536.32 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:38:35,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3335846.6666666665, ans=0.1 2023-11-28 03:38:43,241 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.183e+01 8.811e+01 9.404e+01 1.022e+02 2.241e+02, threshold=1.881e+02, percent-clipped=1.0 2023-11-28 03:38:47,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3335913.3333333335, ans=0.125 2023-11-28 03:38:52,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.30 vs. limit=22.5 2023-11-28 03:39:00,581 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500400 2023-11-28 03:39:29,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3336113.3333333335, ans=0.125 2023-11-28 03:39:34,651 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7450, loss[loss=0.06208, simple_loss=0.0855, pruned_loss=0.01182, audio_tagging_loss=0.007507, over 15642.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09059, pruned_loss=0.01253, audio_tagging_loss=0.008663, over 3053262.85 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:39:37,385 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=15.0 2023-11-28 03:39:58,213 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500450 2023-11-28 03:40:31,118 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7500, loss[loss=0.07779, simple_loss=0.1041, pruned_loss=0.01367, audio_tagging_loss=0.01204, over 14467.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09055, pruned_loss=0.01249, audio_tagging_loss=0.008662, over 3049792.39 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:40:38,136 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.849e+01 9.074e+01 9.605e+01 1.016e+02 1.899e+02, threshold=1.921e+02, percent-clipped=1.0 2023-11-28 03:40:41,999 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.22 vs. limit=12.0 2023-11-28 03:40:55,803 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500500 2023-11-28 03:41:06,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3336713.3333333335, ans=0.125 2023-11-28 03:41:16,200 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:41:28,466 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7550, loss[loss=0.0769, simple_loss=0.1025, pruned_loss=0.01525, audio_tagging_loss=0.01042, over 14319.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09023, pruned_loss=0.01242, audio_tagging_loss=0.008598, over 3047371.85 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:41:28,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3336846.6666666665, ans=0.125 2023-11-28 03:41:49,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3336913.3333333335, ans=0.125 2023-11-28 03:41:52,878 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500550 2023-11-28 03:41:55,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3336980.0, ans=0.5 2023-11-28 03:42:09,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3337046.6666666665, ans=0.125 2023-11-28 03:42:23,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3337113.3333333335, ans=0.125 2023-11-28 03:42:26,175 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7600, loss[loss=0.06346, simple_loss=0.08218, pruned_loss=0.01119, audio_tagging_loss=0.01117, over 15207.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09048, pruned_loss=0.0125, audio_tagging_loss=0.008539, over 3047718.22 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:42:32,003 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:42:32,818 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 8.828e+01 9.447e+01 1.020e+02 1.254e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 03:42:50,605 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500600 2023-11-28 03:42:50,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3337313.3333333335, ans=0.125 2023-11-28 03:42:52,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3337313.3333333335, ans=0.0 2023-11-28 03:43:23,926 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7650, loss[loss=0.0687, simple_loss=0.09486, pruned_loss=0.01213, audio_tagging_loss=0.009144, over 16164.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08904, pruned_loss=0.01233, audio_tagging_loss=0.008694, over 3040151.16 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:43:24,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3337513.3333333335, ans=0.0 2023-11-28 03:43:25,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3337513.3333333335, ans=0.125 2023-11-28 03:43:41,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3337580.0, ans=0.125 2023-11-28 03:43:48,261 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500650 2023-11-28 03:43:51,050 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:44:02,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3337713.3333333335, ans=0.1 2023-11-28 03:44:08,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3337780.0, ans=0.125 2023-11-28 03:44:18,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3337780.0, ans=0.125 2023-11-28 03:44:21,212 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7700, loss[loss=0.0853, simple_loss=0.1148, pruned_loss=0.02143, audio_tagging_loss=0.006491, over 14972.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08915, pruned_loss=0.01238, audio_tagging_loss=0.008648, over 3040061.57 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:44:27,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=3337846.6666666665, ans=12.0 2023-11-28 03:44:27,650 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.039e+01 8.661e+01 9.049e+01 9.903e+01 1.330e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-28 03:44:30,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3337846.6666666665, ans=22.5 2023-11-28 03:44:44,882 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500700 2023-11-28 03:44:59,246 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:45:18,641 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7750, loss[loss=0.06708, simple_loss=0.0947, pruned_loss=0.01265, audio_tagging_loss=0.007077, over 15517.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08927, pruned_loss=0.01219, audio_tagging_loss=0.008683, over 3032897.71 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:45:23,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3338180.0, ans=0.0 2023-11-28 03:45:29,358 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.64 vs. limit=10.0 2023-11-28 03:45:43,002 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500750 2023-11-28 03:46:15,551 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7800, loss[loss=0.09453, simple_loss=0.1268, pruned_loss=0.02316, audio_tagging_loss=0.007956, over 15461.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08995, pruned_loss=0.01244, audio_tagging_loss=0.008718, over 3038988.13 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:46:22,527 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.833e+01 9.588e+01 1.059e+02 1.292e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 03:46:26,039 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2023-11-28 03:46:28,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3338580.0, ans=0.0 2023-11-28 03:46:30,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3338580.0, ans=0.125 2023-11-28 03:46:39,839 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500800 2023-11-28 03:46:55,849 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.70 vs. limit=15.0 2023-11-28 03:47:02,269 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=15.0 2023-11-28 03:47:06,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3338780.0, ans=0.5 2023-11-28 03:47:13,907 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7850, loss[loss=0.05742, simple_loss=0.07256, pruned_loss=0.01076, audio_tagging_loss=0.01038, over 15027.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09028, pruned_loss=0.01247, audio_tagging_loss=0.008738, over 3040361.02 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:47:37,934 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500850 2023-11-28 03:47:52,199 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.37 vs. limit=10.0 2023-11-28 03:48:10,413 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7900, loss[loss=0.04889, simple_loss=0.06109, pruned_loss=0.01096, audio_tagging_loss=0.007388, over 14246.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09061, pruned_loss=0.01248, audio_tagging_loss=0.008791, over 3046846.07 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:48:17,455 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 8.758e+01 9.324e+01 1.005e+02 1.322e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 03:48:33,981 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500900 2023-11-28 03:49:06,820 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7950, loss[loss=0.05832, simple_loss=0.07611, pruned_loss=0.009463, audio_tagging_loss=0.0108, over 14966.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08932, pruned_loss=0.01223, audio_tagging_loss=0.008877, over 3043101.99 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:49:24,555 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 03:49:31,126 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500950 2023-11-28 03:49:33,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3339646.6666666665, ans=0.2 2023-11-28 03:49:40,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3339713.3333333335, ans=0.0 2023-11-28 03:49:52,286 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2023-11-28 03:49:57,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3339780.0, ans=0.0 2023-11-28 03:50:04,241 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8000, loss[loss=0.06638, simple_loss=0.09344, pruned_loss=0.01312, audio_tagging_loss=0.006546, over 15173.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08896, pruned_loss=0.01223, audio_tagging_loss=0.008942, over 3042474.91 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:50:11,487 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 8.539e+01 9.143e+01 9.818e+01 1.375e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-28 03:50:12,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3339846.6666666665, ans=0.035 2023-11-28 03:50:25,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3339913.3333333335, ans=0.09899494936611666 2023-11-28 03:50:28,960 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501000 2023-11-28 03:50:44,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-11-28 03:50:53,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3340113.3333333335, ans=0.1 2023-11-28 03:51:02,063 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8050, loss[loss=0.04587, simple_loss=0.04991, pruned_loss=0.006143, audio_tagging_loss=0.01477, over 14469.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08849, pruned_loss=0.01224, audio_tagging_loss=0.009088, over 3044352.33 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:51:26,198 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501050 2023-11-28 03:51:33,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3340313.3333333335, ans=0.125 2023-11-28 03:51:51,740 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2023-11-28 03:52:00,073 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8100, loss[loss=0.06719, simple_loss=0.08491, pruned_loss=0.01361, audio_tagging_loss=0.01112, over 14790.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08805, pruned_loss=0.01213, audio_tagging_loss=0.009048, over 3043750.44 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:52:07,644 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.647e+01 9.377e+01 1.005e+02 1.143e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-28 03:52:24,110 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501100 2023-11-28 03:52:26,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3340646.6666666665, ans=0.015 2023-11-28 03:52:30,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3340646.6666666665, ans=0.1 2023-11-28 03:52:56,855 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8150, loss[loss=0.07488, simple_loss=0.1096, pruned_loss=0.01467, audio_tagging_loss=0.005414, over 15697.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08863, pruned_loss=0.01202, audio_tagging_loss=0.008967, over 3040678.27 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:52:58,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3340846.6666666665, ans=0.2 2023-11-28 03:53:18,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3340913.3333333335, ans=0.0 2023-11-28 03:53:21,310 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501150 2023-11-28 03:53:24,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3340980.0, ans=0.025 2023-11-28 03:53:35,995 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.04 vs. limit=15.0 2023-11-28 03:53:53,993 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8200, loss[loss=0.08114, simple_loss=0.1079, pruned_loss=0.01976, audio_tagging_loss=0.007453, over 15063.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08864, pruned_loss=0.01204, audio_tagging_loss=0.008779, over 3049338.32 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:53:57,345 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 03:54:00,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3341180.0, ans=0.05 2023-11-28 03:54:02,498 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 8.802e+01 9.578e+01 1.025e+02 1.373e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-28 03:54:04,214 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.22 vs. limit=22.5 2023-11-28 03:54:10,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3341246.6666666665, ans=0.125 2023-11-28 03:54:11,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3341246.6666666665, ans=0.125 2023-11-28 03:54:16,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3341313.3333333335, ans=0.125 2023-11-28 03:54:17,780 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501200 2023-11-28 03:54:19,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3341313.3333333335, ans=0.0 2023-11-28 03:54:51,790 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8250, loss[loss=0.06613, simple_loss=0.09349, pruned_loss=0.0118, audio_tagging_loss=0.007589, over 15856.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08912, pruned_loss=0.01207, audio_tagging_loss=0.008662, over 3050030.05 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:54:53,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3341513.3333333335, ans=0.0 2023-11-28 03:54:54,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3341513.3333333335, ans=0.0 2023-11-28 03:54:59,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3341513.3333333335, ans=0.0 2023-11-28 03:54:59,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3341513.3333333335, ans=0.125 2023-11-28 03:55:03,934 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:55:08,357 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.13 vs. limit=15.0 2023-11-28 03:55:09,112 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.22 vs. limit=15.0 2023-11-28 03:55:15,138 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501250 2023-11-28 03:55:17,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3341646.6666666665, ans=0.0 2023-11-28 03:55:27,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3341713.3333333335, ans=0.125 2023-11-28 03:55:29,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3341713.3333333335, ans=0.0 2023-11-28 03:55:40,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3341780.0, ans=0.125 2023-11-28 03:55:48,582 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8300, loss[loss=0.06878, simple_loss=0.09917, pruned_loss=0.01397, audio_tagging_loss=0.005226, over 15454.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08936, pruned_loss=0.01218, audio_tagging_loss=0.008625, over 3055879.49 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:55:48,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3341846.6666666665, ans=0.0 2023-11-28 03:55:56,895 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.433e+01 8.790e+01 9.364e+01 1.000e+02 1.308e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 03:56:01,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3341913.3333333335, ans=0.0 2023-11-28 03:56:01,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3341913.3333333335, ans=0.1 2023-11-28 03:56:02,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3341913.3333333335, ans=0.125 2023-11-28 03:56:13,760 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501300 2023-11-28 03:56:15,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3341980.0, ans=10.0 2023-11-28 03:56:45,986 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8350, loss[loss=0.04055, simple_loss=0.05024, pruned_loss=0.005386, audio_tagging_loss=0.01004, over 15297.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08936, pruned_loss=0.01223, audio_tagging_loss=0.008595, over 3054066.88 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:56:48,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3342180.0, ans=0.1 2023-11-28 03:57:03,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3342246.6666666665, ans=0.2 2023-11-28 03:57:10,747 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501350 2023-11-28 03:57:20,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3342380.0, ans=0.125 2023-11-28 03:57:28,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3342380.0, ans=0.125 2023-11-28 03:57:41,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3342446.6666666665, ans=0.0 2023-11-28 03:57:43,977 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8400, loss[loss=0.06538, simple_loss=0.08582, pruned_loss=0.01221, audio_tagging_loss=0.01026, over 15111.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08875, pruned_loss=0.01237, audio_tagging_loss=0.008735, over 3052278.05 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:57:44,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3342513.3333333335, ans=0.0 2023-11-28 03:57:51,648 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.717e+01 8.873e+01 9.503e+01 1.023e+02 1.226e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 03:58:07,683 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501400 2023-11-28 03:58:09,669 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.61 vs. limit=22.5 2023-11-28 03:58:10,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3342646.6666666665, ans=0.125 2023-11-28 03:58:13,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3342646.6666666665, ans=0.125 2023-11-28 03:58:16,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3342713.3333333335, ans=0.0 2023-11-28 03:58:20,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3342713.3333333335, ans=0.125 2023-11-28 03:58:41,312 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8450, loss[loss=0.06529, simple_loss=0.09084, pruned_loss=0.01049, audio_tagging_loss=0.009374, over 15224.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.0889, pruned_loss=0.01225, audio_tagging_loss=0.008732, over 3046862.16 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 03:58:47,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3342846.6666666665, ans=0.125 2023-11-28 03:58:52,240 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.80 vs. limit=15.0 2023-11-28 03:58:54,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3342913.3333333335, ans=0.0 2023-11-28 03:58:55,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3342913.3333333335, ans=0.0 2023-11-28 03:58:57,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3342913.3333333335, ans=0.125 2023-11-28 03:59:05,847 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501450 2023-11-28 03:59:13,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3342980.0, ans=0.1 2023-11-28 03:59:17,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3343046.6666666665, ans=0.125 2023-11-28 03:59:22,196 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:59:33,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3343113.3333333335, ans=0.025 2023-11-28 03:59:39,104 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8500, loss[loss=0.04474, simple_loss=0.05366, pruned_loss=0.007267, audio_tagging_loss=0.01065, over 13625.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08911, pruned_loss=0.01237, audio_tagging_loss=0.008701, over 3045287.64 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 03:59:41,857 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=15.0 2023-11-28 03:59:43,211 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.27 vs. limit=10.0 2023-11-28 03:59:44,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3343180.0, ans=0.0 2023-11-28 03:59:46,772 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.888e+01 9.285e+01 1.024e+02 1.288e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 03:59:54,192 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.20 vs. limit=15.0 2023-11-28 04:00:03,303 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501500 2023-11-28 04:00:04,823 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=22.5 2023-11-28 04:00:09,282 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-11-28 04:00:36,612 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8550, loss[loss=0.07981, simple_loss=0.09813, pruned_loss=0.02395, audio_tagging_loss=0.006795, over 14441.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08908, pruned_loss=0.01246, audio_tagging_loss=0.008813, over 3044064.11 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:00:43,702 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.45 vs. limit=12.0 2023-11-28 04:00:45,881 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.56 vs. limit=15.0 2023-11-28 04:00:55,514 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2023-11-28 04:00:59,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3343646.6666666665, ans=0.025 2023-11-28 04:01:00,896 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501550 2023-11-28 04:01:04,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3343646.6666666665, ans=0.125 2023-11-28 04:01:13,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3343713.3333333335, ans=0.125 2023-11-28 04:01:27,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3343780.0, ans=0.07 2023-11-28 04:01:33,888 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8600, loss[loss=0.06239, simple_loss=0.08575, pruned_loss=0.01049, audio_tagging_loss=0.009025, over 15376.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08865, pruned_loss=0.01226, audio_tagging_loss=0.008795, over 3036927.30 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:01:42,159 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.388e+01 8.741e+01 9.411e+01 9.975e+01 1.880e+02, threshold=1.882e+02, percent-clipped=1.0 2023-11-28 04:01:47,105 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2023-11-28 04:01:57,410 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501600 2023-11-28 04:02:00,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3343980.0, ans=0.125 2023-11-28 04:02:02,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=15.0 2023-11-28 04:02:04,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3343980.0, ans=0.1 2023-11-28 04:02:18,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3344113.3333333335, ans=0.125 2023-11-28 04:02:26,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3344113.3333333335, ans=0.0 2023-11-28 04:02:31,104 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8650, loss[loss=0.06776, simple_loss=0.09422, pruned_loss=0.01009, audio_tagging_loss=0.01055, over 15761.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08979, pruned_loss=0.0124, audio_tagging_loss=0.00878, over 3035698.68 frames. ], batch size: 62, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:02:35,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3344180.0, ans=0.125 2023-11-28 04:02:41,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3344246.6666666665, ans=0.07 2023-11-28 04:02:55,660 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501650 2023-11-28 04:03:16,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.65 vs. limit=22.5 2023-11-28 04:03:19,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3344446.6666666665, ans=0.1 2023-11-28 04:03:28,906 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8700, loss[loss=0.06021, simple_loss=0.08218, pruned_loss=0.01009, audio_tagging_loss=0.009027, over 15389.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08947, pruned_loss=0.0123, audio_tagging_loss=0.008878, over 3042668.85 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:03:29,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3344513.3333333335, ans=0.125 2023-11-28 04:03:30,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3344513.3333333335, ans=0.0 2023-11-28 04:03:35,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3344513.3333333335, ans=0.0 2023-11-28 04:03:37,604 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.489e+01 8.850e+01 9.398e+01 9.849e+01 1.274e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-28 04:03:48,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3344580.0, ans=0.2 2023-11-28 04:03:53,030 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501700 2023-11-28 04:04:05,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3344713.3333333335, ans=0.125 2023-11-28 04:04:05,768 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2023-11-28 04:04:08,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3344713.3333333335, ans=0.125 2023-11-28 04:04:26,115 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8750, loss[loss=0.07704, simple_loss=0.09772, pruned_loss=0.01861, audio_tagging_loss=0.009572, over 15258.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08914, pruned_loss=0.01228, audio_tagging_loss=0.008928, over 3040218.26 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:04:32,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3344846.6666666665, ans=0.09899494936611666 2023-11-28 04:04:49,583 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501750 2023-11-28 04:04:59,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3345046.6666666665, ans=0.2 2023-11-28 04:05:13,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3345113.3333333335, ans=0.125 2023-11-28 04:05:22,521 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.75 vs. limit=15.0 2023-11-28 04:05:22,945 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8800, loss[loss=0.07791, simple_loss=0.1038, pruned_loss=0.0185, audio_tagging_loss=0.007538, over 15154.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08914, pruned_loss=0.01218, audio_tagging_loss=0.009028, over 3032730.03 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:05:31,633 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.835e+01 9.360e+01 1.012e+02 1.261e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 04:05:46,836 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501800 2023-11-28 04:05:52,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3345313.3333333335, ans=0.5 2023-11-28 04:05:58,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3345380.0, ans=0.125 2023-11-28 04:06:02,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3345380.0, ans=0.035 2023-11-28 04:06:19,642 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8850, loss[loss=0.05728, simple_loss=0.07874, pruned_loss=0.009711, audio_tagging_loss=0.0082, over 14689.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08991, pruned_loss=0.01241, audio_tagging_loss=0.008882, over 3032714.17 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:06:34,726 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:06:44,152 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501850 2023-11-28 04:07:16,677 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8900, loss[loss=0.05377, simple_loss=0.0704, pruned_loss=0.008012, audio_tagging_loss=0.01056, over 14690.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09036, pruned_loss=0.0124, audio_tagging_loss=0.008816, over 3038346.62 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:07:19,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3345846.6666666665, ans=0.125 2023-11-28 04:07:25,984 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 8.854e+01 9.513e+01 9.955e+01 1.488e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 04:07:29,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3345913.3333333335, ans=0.125 2023-11-28 04:07:35,798 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2023-11-28 04:07:37,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3345913.3333333335, ans=0.0 2023-11-28 04:07:40,823 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501900 2023-11-28 04:08:02,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3346113.3333333335, ans=0.125 2023-11-28 04:08:14,224 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8950, loss[loss=0.05937, simple_loss=0.07978, pruned_loss=0.01185, audio_tagging_loss=0.007636, over 14336.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08987, pruned_loss=0.01224, audio_tagging_loss=0.008714, over 3043087.89 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:08:23,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3346180.0, ans=0.04949747468305833 2023-11-28 04:08:37,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3346313.3333333335, ans=0.125 2023-11-28 04:08:37,877 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501950 2023-11-28 04:08:39,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3346313.3333333335, ans=0.2 2023-11-28 04:08:40,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3346313.3333333335, ans=0.125 2023-11-28 04:08:49,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3346380.0, ans=0.125 2023-11-28 04:09:10,177 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9000, loss[loss=0.07701, simple_loss=0.1128, pruned_loss=0.01398, audio_tagging_loss=0.006636, over 14086.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08976, pruned_loss=0.0123, audio_tagging_loss=0.008665, over 3044818.23 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:09:10,179 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 04:09:27,902 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.6038, 3.4031, 2.9757, 3.3498], device='cuda:0') 2023-11-28 04:09:41,910 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1280, 2.4631, 4.9821, 2.9703], device='cuda:0') 2023-11-28 04:09:44,954 INFO [train_asr.py:1267] (0/4) Epoch 42, validation: loss=0.05915, simple_loss=0.05063, pruned_loss=0.005264, audio_tagging_loss=0.02857, over 4681554.00 frames. 2023-11-28 04:09:44,954 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 04:09:47,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3346513.3333333335, ans=0.0 2023-11-28 04:09:54,942 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 8.664e+01 9.503e+01 1.037e+02 1.475e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 04:10:09,107 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502000 2023-11-28 04:10:15,779 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.38 vs. limit=10.0 2023-11-28 04:10:18,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3346713.3333333335, ans=0.2 2023-11-28 04:10:26,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3346713.3333333335, ans=0.04949747468305833 2023-11-28 04:10:43,078 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9050, loss[loss=0.07431, simple_loss=0.1075, pruned_loss=0.01277, audio_tagging_loss=0.007805, over 16196.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09011, pruned_loss=0.01237, audio_tagging_loss=0.008595, over 3049178.06 frames. ], batch size: 62, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:10:49,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3346846.6666666665, ans=0.125 2023-11-28 04:11:06,641 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502050 2023-11-28 04:11:12,812 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.58 vs. limit=15.0 2023-11-28 04:11:13,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3346980.0, ans=0.125 2023-11-28 04:11:20,400 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-28 04:11:36,918 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.96 vs. limit=22.5 2023-11-28 04:11:40,122 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9100, loss[loss=0.07004, simple_loss=0.1037, pruned_loss=0.01316, audio_tagging_loss=0.005015, over 15038.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09041, pruned_loss=0.01229, audio_tagging_loss=0.008494, over 3049382.24 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:11:47,380 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.05 vs. limit=10.0 2023-11-28 04:11:48,890 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.691e+01 9.383e+01 1.014e+02 1.282e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 04:11:59,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3347246.6666666665, ans=0.125 2023-11-28 04:12:03,026 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502100 2023-11-28 04:12:22,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3347380.0, ans=0.125 2023-11-28 04:12:24,112 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.14 vs. limit=12.0 2023-11-28 04:12:36,769 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9150, loss[loss=0.07085, simple_loss=0.09673, pruned_loss=0.01386, audio_tagging_loss=0.008629, over 15297.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09024, pruned_loss=0.01243, audio_tagging_loss=0.008584, over 3052288.03 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:12:51,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3347580.0, ans=0.0 2023-11-28 04:13:01,310 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502150 2023-11-28 04:13:03,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3347646.6666666665, ans=0.125 2023-11-28 04:13:07,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3347646.6666666665, ans=0.1 2023-11-28 04:13:08,809 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.65 vs. limit=6.0 2023-11-28 04:13:29,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3347780.0, ans=0.125 2023-11-28 04:13:34,147 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9200, loss[loss=0.04253, simple_loss=0.06041, pruned_loss=0.004034, audio_tagging_loss=0.008292, over 14746.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09032, pruned_loss=0.01252, audio_tagging_loss=0.008521, over 3049751.11 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:13:44,688 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.904e+01 8.837e+01 9.520e+01 1.030e+02 1.268e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 04:13:58,694 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502200 2023-11-28 04:14:04,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3347980.0, ans=0.0 2023-11-28 04:14:11,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3348046.6666666665, ans=0.2 2023-11-28 04:14:12,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3348046.6666666665, ans=0.1 2023-11-28 04:14:12,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3348046.6666666665, ans=0.0 2023-11-28 04:14:22,162 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2023-11-28 04:14:32,132 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9250, loss[loss=0.05317, simple_loss=0.07185, pruned_loss=0.007298, audio_tagging_loss=0.00995, over 14759.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08926, pruned_loss=0.0124, audio_tagging_loss=0.008587, over 3049980.27 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:14:49,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3348246.6666666665, ans=0.1 2023-11-28 04:14:55,940 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502250 2023-11-28 04:15:10,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3348380.0, ans=0.1 2023-11-28 04:15:12,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3348380.0, ans=0.2 2023-11-28 04:15:12,937 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.01 vs. limit=22.5 2023-11-28 04:15:29,230 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9300, loss[loss=0.06818, simple_loss=0.09174, pruned_loss=0.01458, audio_tagging_loss=0.007738, over 16159.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08975, pruned_loss=0.01257, audio_tagging_loss=0.008582, over 3053950.62 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:15:29,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3348513.3333333335, ans=0.0 2023-11-28 04:15:33,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3348513.3333333335, ans=0.125 2023-11-28 04:15:40,868 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.853e+01 8.934e+01 9.500e+01 1.008e+02 1.455e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 04:15:44,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3348580.0, ans=0.125 2023-11-28 04:15:45,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3348580.0, ans=0.2 2023-11-28 04:15:50,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3348580.0, ans=0.0 2023-11-28 04:15:53,840 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502300 2023-11-28 04:16:21,034 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.51 vs. limit=10.0 2023-11-28 04:16:24,991 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.10 vs. limit=15.0 2023-11-28 04:16:26,568 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9350, loss[loss=0.06374, simple_loss=0.08501, pruned_loss=0.01241, audio_tagging_loss=0.008822, over 14693.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09004, pruned_loss=0.01267, audio_tagging_loss=0.008583, over 3047804.47 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:16:32,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3348846.6666666665, ans=0.125 2023-11-28 04:16:38,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3348913.3333333335, ans=0.125 2023-11-28 04:16:48,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3348980.0, ans=0.0 2023-11-28 04:16:50,822 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502350 2023-11-28 04:17:09,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3349046.6666666665, ans=0.125 2023-11-28 04:17:12,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3349113.3333333335, ans=0.0 2023-11-28 04:17:19,349 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.67 vs. limit=15.0 2023-11-28 04:17:21,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3349113.3333333335, ans=0.0 2023-11-28 04:17:23,704 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9400, loss[loss=0.07364, simple_loss=0.1061, pruned_loss=0.01336, audio_tagging_loss=0.007255, over 14610.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08977, pruned_loss=0.01252, audio_tagging_loss=0.008667, over 3048605.93 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:17:35,168 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.672e+01 8.937e+01 9.623e+01 1.033e+02 2.333e+02, threshold=1.925e+02, percent-clipped=1.0 2023-11-28 04:17:47,552 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502400 2023-11-28 04:17:49,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3349313.3333333335, ans=0.125 2023-11-28 04:18:04,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3349380.0, ans=0.0 2023-11-28 04:18:07,335 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2023-11-28 04:18:14,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3349446.6666666665, ans=0.05 2023-11-28 04:18:19,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3349446.6666666665, ans=0.1 2023-11-28 04:18:19,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3349446.6666666665, ans=0.2 2023-11-28 04:18:21,493 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9450, loss[loss=0.06648, simple_loss=0.08682, pruned_loss=0.01334, audio_tagging_loss=0.009728, over 15245.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08957, pruned_loss=0.01243, audio_tagging_loss=0.008815, over 3050580.56 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:18:23,699 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:18:28,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3349513.3333333335, ans=0.125 2023-11-28 04:18:31,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3349580.0, ans=0.125 2023-11-28 04:18:34,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3349580.0, ans=0.125 2023-11-28 04:18:45,205 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502450 2023-11-28 04:19:00,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3349713.3333333335, ans=0.125 2023-11-28 04:19:02,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3349713.3333333335, ans=0.125 2023-11-28 04:19:06,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3349780.0, ans=0.125 2023-11-28 04:19:07,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3349780.0, ans=0.125 2023-11-28 04:19:12,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3349780.0, ans=0.1 2023-11-28 04:19:18,889 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9500, loss[loss=0.07432, simple_loss=0.1033, pruned_loss=0.01432, audio_tagging_loss=0.008331, over 14846.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08898, pruned_loss=0.01237, audio_tagging_loss=0.008972, over 3046034.07 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:19:29,917 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 8.748e+01 9.346e+01 1.036e+02 1.231e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 04:19:32,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3349913.3333333335, ans=0.125 2023-11-28 04:19:33,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3349913.3333333335, ans=0.0 2023-11-28 04:19:43,130 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502500 2023-11-28 04:19:46,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3349980.0, ans=0.0 2023-11-28 04:19:47,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3349980.0, ans=0.125 2023-11-28 04:19:49,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3349980.0, ans=0.125 2023-11-28 04:20:15,478 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9550, loss[loss=0.04734, simple_loss=0.0567, pruned_loss=0.008968, audio_tagging_loss=0.01002, over 13749.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08935, pruned_loss=0.01234, audio_tagging_loss=0.008971, over 3037605.49 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:20:29,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3350246.6666666665, ans=0.125 2023-11-28 04:20:38,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3350313.3333333335, ans=0.0 2023-11-28 04:20:39,868 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502550 2023-11-28 04:20:49,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3350380.0, ans=0.125 2023-11-28 04:21:04,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3350446.6666666665, ans=0.0 2023-11-28 04:21:06,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3350446.6666666665, ans=0.125 2023-11-28 04:21:06,664 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.08 vs. limit=15.0 2023-11-28 04:21:13,653 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9600, loss[loss=0.05518, simple_loss=0.07451, pruned_loss=0.007016, audio_tagging_loss=0.0109, over 16385.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09041, pruned_loss=0.01256, audio_tagging_loss=0.008934, over 3033750.52 frames. ], batch size: 63, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:21:15,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=22.5 2023-11-28 04:21:24,513 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.824e+01 8.754e+01 9.206e+01 1.000e+02 1.278e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-28 04:21:28,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3350580.0, ans=0.125 2023-11-28 04:21:37,324 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502600 2023-11-28 04:21:40,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3350646.6666666665, ans=0.125 2023-11-28 04:21:41,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2023-11-28 04:21:43,734 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.10 vs. limit=15.0 2023-11-28 04:21:44,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3350646.6666666665, ans=0.125 2023-11-28 04:21:48,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3350713.3333333335, ans=0.07 2023-11-28 04:21:54,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3350713.3333333335, ans=0.125 2023-11-28 04:21:58,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3350713.3333333335, ans=0.125 2023-11-28 04:22:00,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3350780.0, ans=0.1 2023-11-28 04:22:02,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3350780.0, ans=0.0 2023-11-28 04:22:07,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3350780.0, ans=0.0 2023-11-28 04:22:10,901 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9650, loss[loss=0.03369, simple_loss=0.03325, pruned_loss=0.005281, audio_tagging_loss=0.01179, over 13849.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.0898, pruned_loss=0.01248, audio_tagging_loss=0.00895, over 3033405.90 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:22:30,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3350913.3333333335, ans=0.125 2023-11-28 04:22:35,689 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502650 2023-11-28 04:22:35,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3350980.0, ans=0.125 2023-11-28 04:22:46,455 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2023-11-28 04:22:47,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3351046.6666666665, ans=0.125 2023-11-28 04:22:48,915 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2023-11-28 04:22:54,004 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:22:57,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3351113.3333333335, ans=0.125 2023-11-28 04:22:59,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3351113.3333333335, ans=0.125 2023-11-28 04:23:08,617 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9700, loss[loss=0.08697, simple_loss=0.1259, pruned_loss=0.01644, audio_tagging_loss=0.007581, over 16351.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08994, pruned_loss=0.01245, audio_tagging_loss=0.008797, over 3032950.15 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:23:21,611 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 8.660e+01 9.403e+01 1.036e+02 1.751e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-28 04:23:22,164 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.35 vs. limit=15.0 2023-11-28 04:23:33,238 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502700 2023-11-28 04:23:41,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3351313.3333333335, ans=0.2 2023-11-28 04:23:47,302 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:23:51,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2023-11-28 04:24:06,612 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9750, loss[loss=0.05379, simple_loss=0.07064, pruned_loss=0.009692, audio_tagging_loss=0.008783, over 15086.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.0896, pruned_loss=0.01228, audio_tagging_loss=0.008684, over 3035364.16 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:24:13,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3351513.3333333335, ans=0.125 2023-11-28 04:24:16,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3351513.3333333335, ans=0.0 2023-11-28 04:24:30,806 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502750 2023-11-28 04:24:44,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3351713.3333333335, ans=0.0 2023-11-28 04:25:04,300 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9800, loss[loss=0.05926, simple_loss=0.07924, pruned_loss=0.00975, audio_tagging_loss=0.009885, over 14210.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08985, pruned_loss=0.01235, audio_tagging_loss=0.008629, over 3039787.21 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:25:05,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3351846.6666666665, ans=0.1 2023-11-28 04:25:16,772 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.450e+01 8.861e+01 9.508e+01 1.028e+02 1.749e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 04:25:24,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3351913.3333333335, ans=0.025 2023-11-28 04:25:28,338 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502800 2023-11-28 04:25:30,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3351980.0, ans=0.1 2023-11-28 04:25:50,884 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.64 vs. limit=6.0 2023-11-28 04:25:59,728 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:26:01,887 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9850, loss[loss=0.06586, simple_loss=0.08352, pruned_loss=0.01695, audio_tagging_loss=0.007147, over 16080.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09038, pruned_loss=0.01247, audio_tagging_loss=0.008533, over 3039118.40 frames. ], batch size: 62, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:26:02,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3352180.0, ans=0.0 2023-11-28 04:26:03,571 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=15.0 2023-11-28 04:26:26,285 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502850 2023-11-28 04:26:26,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3352313.3333333335, ans=0.125 2023-11-28 04:26:31,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3352313.3333333335, ans=0.125 2023-11-28 04:26:36,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3352380.0, ans=0.0 2023-11-28 04:26:43,910 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.30 vs. limit=22.5 2023-11-28 04:26:46,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3352446.6666666665, ans=0.125 2023-11-28 04:26:51,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3352446.6666666665, ans=0.025 2023-11-28 04:26:57,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3352446.6666666665, ans=0.125 2023-11-28 04:26:58,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3352513.3333333335, ans=0.1 2023-11-28 04:26:59,727 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9900, loss[loss=0.09559, simple_loss=0.1368, pruned_loss=0.02257, audio_tagging_loss=0.004623, over 15747.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09196, pruned_loss=0.01278, audio_tagging_loss=0.008473, over 3037989.58 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:26:59,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3352513.3333333335, ans=0.1 2023-11-28 04:27:10,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-28 04:27:12,292 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.961e+01 8.663e+01 9.354e+01 9.948e+01 1.345e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-28 04:27:23,796 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502900 2023-11-28 04:27:35,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3352713.3333333335, ans=0.0 2023-11-28 04:27:41,164 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=15.0 2023-11-28 04:27:48,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3352780.0, ans=0.0 2023-11-28 04:27:57,186 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9950, loss[loss=0.067, simple_loss=0.08304, pruned_loss=0.01577, audio_tagging_loss=0.00971, over 15419.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09144, pruned_loss=0.01252, audio_tagging_loss=0.008486, over 3038041.16 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:28:06,144 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=15.0 2023-11-28 04:28:14,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3352913.3333333335, ans=0.1 2023-11-28 04:28:20,949 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502950 2023-11-28 04:28:30,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3353046.6666666665, ans=0.0 2023-11-28 04:28:35,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3353046.6666666665, ans=0.125 2023-11-28 04:28:42,762 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.95 vs. limit=15.0 2023-11-28 04:28:51,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3353113.3333333335, ans=0.125 2023-11-28 04:28:54,845 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10000, loss[loss=0.0528, simple_loss=0.06724, pruned_loss=0.008704, audio_tagging_loss=0.01047, over 15853.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09153, pruned_loss=0.01254, audio_tagging_loss=0.00842, over 3044308.17 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:29:08,339 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.922e+01 8.771e+01 9.442e+01 1.017e+02 1.444e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 04:29:18,709 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503000 2023-11-28 04:29:20,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3353313.3333333335, ans=0.125 2023-11-28 04:29:27,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3353313.3333333335, ans=0.125 2023-11-28 04:29:33,827 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.91 vs. limit=15.0 2023-11-28 04:29:51,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3353513.3333333335, ans=0.0 2023-11-28 04:29:52,396 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10050, loss[loss=0.06287, simple_loss=0.09294, pruned_loss=0.01038, audio_tagging_loss=0.006015, over 15084.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09076, pruned_loss=0.01243, audio_tagging_loss=0.008535, over 3049882.31 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:30:05,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3353580.0, ans=0.125 2023-11-28 04:30:17,467 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503050 2023-11-28 04:30:31,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3353713.3333333335, ans=0.2 2023-11-28 04:30:33,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3353713.3333333335, ans=0.125 2023-11-28 04:30:50,287 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10100, loss[loss=0.0855, simple_loss=0.1142, pruned_loss=0.02038, audio_tagging_loss=0.00802, over 15434.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09083, pruned_loss=0.01259, audio_tagging_loss=0.008554, over 3050752.62 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:31:04,743 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.581e+01 9.372e+01 1.014e+02 1.280e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-28 04:31:11,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3353913.3333333335, ans=0.125 2023-11-28 04:31:14,657 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503100 2023-11-28 04:31:17,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3353980.0, ans=0.125 2023-11-28 04:31:26,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3354046.6666666665, ans=0.125 2023-11-28 04:31:39,644 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:31:42,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3354113.3333333335, ans=0.1 2023-11-28 04:31:47,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3354180.0, ans=0.125 2023-11-28 04:31:48,555 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10150, loss[loss=0.04739, simple_loss=0.05745, pruned_loss=0.008744, audio_tagging_loss=0.009918, over 14020.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08991, pruned_loss=0.01233, audio_tagging_loss=0.00862, over 3054096.26 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:31:56,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3354180.0, ans=0.0 2023-11-28 04:32:00,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3354246.6666666665, ans=0.0 2023-11-28 04:32:00,807 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:32:09,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3354246.6666666665, ans=0.125 2023-11-28 04:32:10,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3354313.3333333335, ans=0.125 2023-11-28 04:32:12,533 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503150 2023-11-28 04:32:18,934 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:32:28,416 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-28 04:32:31,239 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:32:45,388 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10200, loss[loss=0.07549, simple_loss=0.09561, pruned_loss=0.01751, audio_tagging_loss=0.01017, over 15494.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.0903, pruned_loss=0.01242, audio_tagging_loss=0.008696, over 3059081.56 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:32:45,946 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-11-28 04:32:51,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3354513.3333333335, ans=0.125 2023-11-28 04:32:59,164 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.169e+01 8.630e+01 9.209e+01 1.011e+02 1.470e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-28 04:33:09,107 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503200 2023-11-28 04:33:11,091 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:33:16,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3354646.6666666665, ans=0.125 2023-11-28 04:33:28,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3354713.3333333335, ans=0.1 2023-11-28 04:33:33,751 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.77 vs. limit=15.0 2023-11-28 04:33:41,825 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10250, loss[loss=0.06606, simple_loss=0.09758, pruned_loss=0.009213, audio_tagging_loss=0.00806, over 15737.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.0907, pruned_loss=0.01255, audio_tagging_loss=0.008803, over 3056125.28 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:33:43,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3354846.6666666665, ans=0.2 2023-11-28 04:33:58,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3354913.3333333335, ans=0.125 2023-11-28 04:34:05,870 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503250 2023-11-28 04:34:09,806 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.36 vs. limit=10.0 2023-11-28 04:34:16,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3355046.6666666665, ans=0.0 2023-11-28 04:34:38,556 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10300, loss[loss=0.0762, simple_loss=0.1053, pruned_loss=0.01512, audio_tagging_loss=0.00845, over 15500.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09152, pruned_loss=0.01271, audio_tagging_loss=0.008722, over 3062395.24 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:34:43,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3355180.0, ans=0.1 2023-11-28 04:34:51,555 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.12 vs. limit=15.0 2023-11-28 04:34:51,860 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.324e+01 9.001e+01 9.538e+01 1.014e+02 1.211e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 04:34:53,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3355246.6666666665, ans=0.2 2023-11-28 04:35:02,843 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503300 2023-11-28 04:35:04,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.09 vs. limit=8.0 2023-11-28 04:35:20,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3355380.0, ans=0.09899494936611666 2023-11-28 04:35:22,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3355380.0, ans=0.0 2023-11-28 04:35:27,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3355446.6666666665, ans=0.125 2023-11-28 04:35:35,758 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10350, loss[loss=0.08412, simple_loss=0.1154, pruned_loss=0.01965, audio_tagging_loss=0.006754, over 16313.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09138, pruned_loss=0.01275, audio_tagging_loss=0.008786, over 3056835.17 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:35:43,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3355513.3333333335, ans=0.125 2023-11-28 04:35:59,225 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503350 2023-11-28 04:36:03,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3355646.6666666665, ans=0.0 2023-11-28 04:36:09,014 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.61 vs. limit=22.5 2023-11-28 04:36:32,725 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10400, loss[loss=0.05604, simple_loss=0.07397, pruned_loss=0.007779, audio_tagging_loss=0.01127, over 15523.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.0903, pruned_loss=0.01239, audio_tagging_loss=0.008876, over 3054785.68 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:36:47,555 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 8.770e+01 9.452e+01 1.025e+02 1.480e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 04:36:56,912 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503400 2023-11-28 04:37:19,387 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2023-11-28 04:37:30,478 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10450, loss[loss=0.08714, simple_loss=0.1094, pruned_loss=0.0242, audio_tagging_loss=0.008242, over 14625.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08886, pruned_loss=0.01222, audio_tagging_loss=0.008993, over 3045624.88 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:37:55,345 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503450 2023-11-28 04:37:59,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3356313.3333333335, ans=0.125 2023-11-28 04:38:17,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3356446.6666666665, ans=0.125 2023-11-28 04:38:18,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3356446.6666666665, ans=0.0 2023-11-28 04:38:21,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3356446.6666666665, ans=0.125 2023-11-28 04:38:26,153 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:38:28,243 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10500, loss[loss=0.0676, simple_loss=0.09594, pruned_loss=0.01211, audio_tagging_loss=0.007523, over 14431.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08851, pruned_loss=0.01214, audio_tagging_loss=0.008888, over 3047502.17 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:38:36,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3356513.3333333335, ans=0.125 2023-11-28 04:38:41,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3356580.0, ans=0.0 2023-11-28 04:38:43,217 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.125e+01 8.671e+01 9.492e+01 1.004e+02 1.311e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 04:38:49,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3356580.0, ans=0.125 2023-11-28 04:38:52,136 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503500 2023-11-28 04:38:54,789 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.99 vs. limit=6.0 2023-11-28 04:39:01,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3356713.3333333335, ans=0.015 2023-11-28 04:39:14,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3356780.0, ans=0.125 2023-11-28 04:39:25,933 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10550, loss[loss=0.06799, simple_loss=0.09409, pruned_loss=0.01311, audio_tagging_loss=0.007839, over 15754.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08882, pruned_loss=0.01227, audio_tagging_loss=0.008768, over 3049485.74 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:39:49,582 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503550 2023-11-28 04:40:02,540 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.66 vs. limit=12.0 2023-11-28 04:40:03,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3357046.6666666665, ans=0.2 2023-11-28 04:40:14,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3357113.3333333335, ans=0.1 2023-11-28 04:40:20,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3357113.3333333335, ans=0.125 2023-11-28 04:40:22,839 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10600, loss[loss=0.05221, simple_loss=0.07552, pruned_loss=0.006591, audio_tagging_loss=0.007864, over 15257.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08904, pruned_loss=0.01241, audio_tagging_loss=0.008729, over 3045489.00 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:40:24,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3357180.0, ans=0.0 2023-11-28 04:40:27,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3357180.0, ans=0.125 2023-11-28 04:40:31,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3357180.0, ans=0.125 2023-11-28 04:40:37,795 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.827e+01 9.555e+01 1.028e+02 1.264e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 04:40:48,235 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503600 2023-11-28 04:40:50,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3357313.3333333335, ans=0.125 2023-11-28 04:40:56,677 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.46 vs. limit=15.0 2023-11-28 04:40:59,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3357380.0, ans=0.1 2023-11-28 04:41:01,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3357380.0, ans=0.2 2023-11-28 04:41:02,679 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.23 vs. limit=12.0 2023-11-28 04:41:06,904 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.14 vs. limit=15.0 2023-11-28 04:41:09,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3357446.6666666665, ans=0.125 2023-11-28 04:41:21,504 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10650, loss[loss=0.09681, simple_loss=0.1353, pruned_loss=0.02387, audio_tagging_loss=0.005299, over 16423.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08949, pruned_loss=0.01249, audio_tagging_loss=0.008681, over 3042440.52 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:41:26,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3357513.3333333335, ans=0.125 2023-11-28 04:41:27,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3357513.3333333335, ans=0.1 2023-11-28 04:41:39,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3357580.0, ans=0.1 2023-11-28 04:41:46,314 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503650 2023-11-28 04:41:50,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3357646.6666666665, ans=0.0 2023-11-28 04:41:54,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3357646.6666666665, ans=0.0 2023-11-28 04:42:12,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3357780.0, ans=0.1 2023-11-28 04:42:13,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3357780.0, ans=0.125 2023-11-28 04:42:15,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3357780.0, ans=0.125 2023-11-28 04:42:20,166 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10700, loss[loss=0.06607, simple_loss=0.08284, pruned_loss=0.01205, audio_tagging_loss=0.0126, over 16100.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08916, pruned_loss=0.01241, audio_tagging_loss=0.008703, over 3046994.64 frames. ], batch size: 64, lr: 1.60e-03, grad_scale: 8.0 2023-11-28 04:42:23,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3357846.6666666665, ans=0.125 2023-11-28 04:42:33,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3357913.3333333335, ans=0.125 2023-11-28 04:42:35,419 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.497e+01 9.278e+01 9.975e+01 1.438e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-28 04:42:43,750 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503700 2023-11-28 04:42:55,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3358046.6666666665, ans=0.0 2023-11-28 04:43:10,214 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2023-11-28 04:43:16,260 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10750, loss[loss=0.08306, simple_loss=0.1131, pruned_loss=0.01849, audio_tagging_loss=0.008041, over 15555.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08904, pruned_loss=0.01233, audio_tagging_loss=0.008727, over 3050767.90 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 8.0 2023-11-28 04:43:24,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3358180.0, ans=0.125 2023-11-28 04:43:29,491 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.97 vs. limit=22.5 2023-11-28 04:43:40,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3358313.3333333335, ans=0.2 2023-11-28 04:43:40,944 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503750 2023-11-28 04:43:52,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3358380.0, ans=0.0 2023-11-28 04:43:58,498 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.09 vs. limit=22.5 2023-11-28 04:44:13,537 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10800, loss[loss=0.07892, simple_loss=0.1065, pruned_loss=0.01685, audio_tagging_loss=0.008836, over 16149.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08919, pruned_loss=0.01248, audio_tagging_loss=0.00881, over 3045585.28 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:44:26,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3358580.0, ans=0.125 2023-11-28 04:44:30,561 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.815e+01 9.428e+01 9.959e+01 1.276e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 04:44:30,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3358580.0, ans=0.07 2023-11-28 04:44:36,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3358646.6666666665, ans=0.09899494936611666 2023-11-28 04:44:37,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3358646.6666666665, ans=0.0 2023-11-28 04:44:38,282 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503800 2023-11-28 04:44:42,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3358646.6666666665, ans=0.0 2023-11-28 04:44:45,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3358646.6666666665, ans=0.0 2023-11-28 04:45:04,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3358780.0, ans=0.125 2023-11-28 04:45:12,761 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10850, loss[loss=0.07976, simple_loss=0.1125, pruned_loss=0.01654, audio_tagging_loss=0.006961, over 14738.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08951, pruned_loss=0.01248, audio_tagging_loss=0.008732, over 3050076.55 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:45:13,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3358846.6666666665, ans=0.125 2023-11-28 04:45:36,411 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503850 2023-11-28 04:45:50,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3359046.6666666665, ans=0.125 2023-11-28 04:46:09,856 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10900, loss[loss=0.04843, simple_loss=0.06828, pruned_loss=0.006765, audio_tagging_loss=0.007528, over 14187.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08969, pruned_loss=0.01262, audio_tagging_loss=0.008763, over 3051658.79 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:46:09,880 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:46:15,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3359180.0, ans=0.2 2023-11-28 04:46:15,839 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.60 vs. limit=12.0 2023-11-28 04:46:25,734 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.427e+01 8.788e+01 9.283e+01 9.844e+01 1.254e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 04:46:27,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3359246.6666666665, ans=0.125 2023-11-28 04:46:34,070 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503900 2023-11-28 04:47:07,425 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10950, loss[loss=0.06292, simple_loss=0.08916, pruned_loss=0.01083, audio_tagging_loss=0.007509, over 14815.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.0902, pruned_loss=0.01262, audio_tagging_loss=0.008784, over 3056757.49 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:47:25,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3359580.0, ans=0.0 2023-11-28 04:47:31,939 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503950 2023-11-28 04:47:34,448 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.04 vs. limit=15.0 2023-11-28 04:47:36,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3359646.6666666665, ans=0.2 2023-11-28 04:47:44,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3359713.3333333335, ans=0.2 2023-11-28 04:47:55,019 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.96 vs. limit=15.0 2023-11-28 04:48:05,134 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11000, loss[loss=0.07661, simple_loss=0.1043, pruned_loss=0.0143, audio_tagging_loss=0.01017, over 14297.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08969, pruned_loss=0.01241, audio_tagging_loss=0.00883, over 3053576.86 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:48:11,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3359846.6666666665, ans=0.2 2023-11-28 04:48:17,922 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:48:21,134 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.126e+01 8.488e+01 9.034e+01 9.756e+01 1.163e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-28 04:48:29,559 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504000 2023-11-28 04:48:30,927 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-504000.pt 2023-11-28 04:48:33,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3359980.0, ans=0.125 2023-11-28 04:48:41,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3360046.6666666665, ans=0.2 2023-11-28 04:48:46,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3360046.6666666665, ans=0.2 2023-11-28 04:48:47,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3360046.6666666665, ans=22.5 2023-11-28 04:48:49,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3360046.6666666665, ans=0.125 2023-11-28 04:48:52,333 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.57 vs. limit=22.5 2023-11-28 04:48:56,264 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:49:05,276 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11050, loss[loss=0.08398, simple_loss=0.1221, pruned_loss=0.01496, audio_tagging_loss=0.007998, over 15814.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09034, pruned_loss=0.01244, audio_tagging_loss=0.008888, over 3053316.37 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:49:22,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3360246.6666666665, ans=0.125 2023-11-28 04:49:28,481 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504050 2023-11-28 04:49:47,955 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.90 vs. limit=15.0 2023-11-28 04:49:51,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3360446.6666666665, ans=0.1 2023-11-28 04:50:00,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3360446.6666666665, ans=0.125 2023-11-28 04:50:02,378 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11100, loss[loss=0.07687, simple_loss=0.1029, pruned_loss=0.01394, audio_tagging_loss=0.01149, over 15369.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09034, pruned_loss=0.01244, audio_tagging_loss=0.009013, over 3045383.21 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:50:08,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3360513.3333333335, ans=0.0 2023-11-28 04:50:08,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3360513.3333333335, ans=0.0 2023-11-28 04:50:18,572 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.723e+01 9.489e+01 1.017e+02 2.061e+02, threshold=1.898e+02, percent-clipped=1.0 2023-11-28 04:50:26,321 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504100 2023-11-28 04:50:59,703 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11150, loss[loss=0.06587, simple_loss=0.09354, pruned_loss=0.01083, audio_tagging_loss=0.008268, over 15504.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09084, pruned_loss=0.01256, audio_tagging_loss=0.008947, over 3045242.61 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:51:22,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3360980.0, ans=0.125 2023-11-28 04:51:23,885 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504150 2023-11-28 04:51:30,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3360980.0, ans=0.1 2023-11-28 04:51:57,692 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11200, loss[loss=0.05861, simple_loss=0.07543, pruned_loss=0.01096, audio_tagging_loss=0.009937, over 14470.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09047, pruned_loss=0.01245, audio_tagging_loss=0.00904, over 3054215.65 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:52:13,622 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 8.826e+01 9.324e+01 1.011e+02 1.372e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 04:52:21,325 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504200 2023-11-28 04:52:32,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3361380.0, ans=0.125 2023-11-28 04:52:33,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3361380.0, ans=0.2 2023-11-28 04:52:36,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3361380.0, ans=0.0 2023-11-28 04:52:46,560 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=15.0 2023-11-28 04:52:55,521 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11250, loss[loss=0.05202, simple_loss=0.07344, pruned_loss=0.005265, audio_tagging_loss=0.01003, over 15487.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08937, pruned_loss=0.01231, audio_tagging_loss=0.008979, over 3053513.49 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:53:18,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.00 vs. limit=15.0 2023-11-28 04:53:19,174 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504250 2023-11-28 04:53:21,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3361646.6666666665, ans=0.125 2023-11-28 04:53:23,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3361646.6666666665, ans=0.0 2023-11-28 04:53:31,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3361713.3333333335, ans=0.125 2023-11-28 04:53:48,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3361780.0, ans=0.125 2023-11-28 04:53:52,336 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11300, loss[loss=0.06076, simple_loss=0.08006, pruned_loss=0.01121, audio_tagging_loss=0.009523, over 14544.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08979, pruned_loss=0.01239, audio_tagging_loss=0.008798, over 3051394.68 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:54:09,276 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.179e+01 8.810e+01 9.312e+01 1.008e+02 1.209e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-28 04:54:13,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3361913.3333333335, ans=0.125 2023-11-28 04:54:16,572 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504300 2023-11-28 04:54:50,072 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11350, loss[loss=0.06408, simple_loss=0.09266, pruned_loss=0.01069, audio_tagging_loss=0.007052, over 17608.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08988, pruned_loss=0.01245, audio_tagging_loss=0.0087, over 3057340.58 frames. ], batch size: 67, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:54:52,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3362180.0, ans=0.125 2023-11-28 04:55:06,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3362246.6666666665, ans=0.125 2023-11-28 04:55:14,385 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504350 2023-11-28 04:55:14,998 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.70 vs. limit=12.0 2023-11-28 04:55:35,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3362446.6666666665, ans=0.125 2023-11-28 04:55:39,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=3362446.6666666665, ans=10.0 2023-11-28 04:55:40,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3362446.6666666665, ans=0.2 2023-11-28 04:55:42,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3362446.6666666665, ans=0.0 2023-11-28 04:55:48,180 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11400, loss[loss=0.06865, simple_loss=0.09584, pruned_loss=0.01292, audio_tagging_loss=0.007812, over 15742.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09048, pruned_loss=0.01258, audio_tagging_loss=0.008604, over 3050597.48 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:55:52,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=3362513.3333333335, ans=12.0 2023-11-28 04:55:58,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3362580.0, ans=0.0 2023-11-28 04:56:05,107 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.951e+01 9.530e+01 1.041e+02 1.873e+02, threshold=1.906e+02, percent-clipped=1.0 2023-11-28 04:56:12,169 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504400 2023-11-28 04:56:35,255 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=15.0 2023-11-28 04:56:45,795 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11450, loss[loss=0.0361, simple_loss=0.05093, pruned_loss=0.003667, audio_tagging_loss=0.006971, over 13879.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.0902, pruned_loss=0.01253, audio_tagging_loss=0.008519, over 3045989.48 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 8.0 2023-11-28 04:56:56,158 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.58 vs. limit=22.5 2023-11-28 04:56:56,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3362913.3333333335, ans=0.0 2023-11-28 04:57:09,841 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504450 2023-11-28 04:57:11,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.01 vs. limit=15.0 2023-11-28 04:57:27,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3363046.6666666665, ans=0.1 2023-11-28 04:57:27,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3363046.6666666665, ans=0.0 2023-11-28 04:57:28,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3363046.6666666665, ans=0.09899494936611666 2023-11-28 04:57:43,786 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11500, loss[loss=0.06587, simple_loss=0.08736, pruned_loss=0.01331, audio_tagging_loss=0.008877, over 15196.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08907, pruned_loss=0.01243, audio_tagging_loss=0.008548, over 3040242.48 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 8.0 2023-11-28 04:57:45,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3363180.0, ans=0.0 2023-11-28 04:57:50,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3363180.0, ans=0.125 2023-11-28 04:58:02,620 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.810e+01 9.465e+01 1.017e+02 1.248e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 04:58:08,104 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504500 2023-11-28 04:58:34,629 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.74 vs. limit=10.0 2023-11-28 04:58:40,740 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11550, loss[loss=0.06376, simple_loss=0.08646, pruned_loss=0.01176, audio_tagging_loss=0.008768, over 15611.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09009, pruned_loss=0.01266, audio_tagging_loss=0.008494, over 3043241.48 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 8.0 2023-11-28 04:59:05,970 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504550 2023-11-28 04:59:18,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.84 vs. limit=15.0 2023-11-28 04:59:19,018 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:59:22,950 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2023-11-28 04:59:24,988 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.65 vs. limit=6.0 2023-11-28 04:59:29,491 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.33 vs. limit=10.0 2023-11-28 04:59:30,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3363780.0, ans=0.125 2023-11-28 04:59:32,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3363780.0, ans=0.1 2023-11-28 04:59:38,809 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11600, loss[loss=0.04677, simple_loss=0.0556, pruned_loss=0.008379, audio_tagging_loss=0.01059, over 16003.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09071, pruned_loss=0.01281, audio_tagging_loss=0.008446, over 3044288.64 frames. ], batch size: 62, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:59:57,182 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.924e+01 8.620e+01 9.416e+01 1.017e+02 1.407e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 05:00:02,717 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504600 2023-11-28 05:00:10,286 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.34 vs. limit=10.0 2023-11-28 05:00:11,786 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=15.0 2023-11-28 05:00:13,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3364046.6666666665, ans=0.125 2023-11-28 05:00:27,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3364113.3333333335, ans=0.1 2023-11-28 05:00:32,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3364113.3333333335, ans=0.2 2023-11-28 05:00:36,728 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11650, loss[loss=0.05363, simple_loss=0.06864, pruned_loss=0.009413, audio_tagging_loss=0.009901, over 14189.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09015, pruned_loss=0.01262, audio_tagging_loss=0.008566, over 3045977.99 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:00:37,571 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.93 vs. limit=10.0 2023-11-28 05:00:44,635 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:01:01,216 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504650 2023-11-28 05:01:01,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3364313.3333333335, ans=0.0 2023-11-28 05:01:04,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3364313.3333333335, ans=0.2 2023-11-28 05:01:05,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3364313.3333333335, ans=0.0 2023-11-28 05:01:13,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3364380.0, ans=0.2 2023-11-28 05:01:27,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3364446.6666666665, ans=0.0 2023-11-28 05:01:33,596 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11700, loss[loss=0.07532, simple_loss=0.1039, pruned_loss=0.01389, audio_tagging_loss=0.009497, over 15791.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09116, pruned_loss=0.0128, audio_tagging_loss=0.008595, over 3056585.32 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:01:35,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3364513.3333333335, ans=0.1 2023-11-28 05:01:36,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3364513.3333333335, ans=0.125 2023-11-28 05:01:45,293 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2023-11-28 05:01:52,250 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.810e+01 9.366e+01 1.007e+02 1.398e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 05:01:55,092 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.46 vs. limit=12.0 2023-11-28 05:01:58,252 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504700 2023-11-28 05:02:12,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3364713.3333333335, ans=0.125 2023-11-28 05:02:24,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3364780.0, ans=0.0 2023-11-28 05:02:31,530 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11750, loss[loss=0.06686, simple_loss=0.08185, pruned_loss=0.0145, audio_tagging_loss=0.01144, over 14910.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09106, pruned_loss=0.01274, audio_tagging_loss=0.008595, over 3057655.90 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:02:54,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3364980.0, ans=0.0 2023-11-28 05:02:55,554 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504750 2023-11-28 05:03:04,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3365046.6666666665, ans=15.0 2023-11-28 05:03:13,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3365046.6666666665, ans=0.0 2023-11-28 05:03:27,983 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.44 vs. limit=15.0 2023-11-28 05:03:29,533 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11800, loss[loss=0.06736, simple_loss=0.09327, pruned_loss=0.01176, audio_tagging_loss=0.008973, over 15241.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.0912, pruned_loss=0.01292, audio_tagging_loss=0.008656, over 3052210.54 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:03:34,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3365180.0, ans=0.0 2023-11-28 05:03:38,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3365180.0, ans=0.125 2023-11-28 05:03:47,023 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.284e+01 8.611e+01 9.542e+01 1.045e+02 1.429e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 05:03:53,111 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504800 2023-11-28 05:03:54,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3365313.3333333335, ans=0.125 2023-11-28 05:04:03,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3365380.0, ans=0.0 2023-11-28 05:04:05,685 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.47 vs. limit=12.0 2023-11-28 05:04:07,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3365380.0, ans=0.1 2023-11-28 05:04:15,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3365446.6666666665, ans=0.07 2023-11-28 05:04:16,218 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.45 vs. limit=10.0 2023-11-28 05:04:18,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3365446.6666666665, ans=0.125 2023-11-28 05:04:26,615 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11850, loss[loss=0.06303, simple_loss=0.09806, pruned_loss=0.006817, audio_tagging_loss=0.007186, over 14275.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09089, pruned_loss=0.01273, audio_tagging_loss=0.008686, over 3054352.72 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:04:28,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3365513.3333333335, ans=0.2 2023-11-28 05:04:29,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3365513.3333333335, ans=0.125 2023-11-28 05:04:36,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3365513.3333333335, ans=0.1 2023-11-28 05:04:38,912 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.34 vs. limit=15.0 2023-11-28 05:04:49,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3365646.6666666665, ans=0.0 2023-11-28 05:04:51,177 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504850 2023-11-28 05:05:01,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3365713.3333333335, ans=0.125 2023-11-28 05:05:07,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3365713.3333333335, ans=0.1 2023-11-28 05:05:14,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3365780.0, ans=0.125 2023-11-28 05:05:24,487 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11900, loss[loss=0.06005, simple_loss=0.07931, pruned_loss=0.007413, audio_tagging_loss=0.01298, over 15426.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.0915, pruned_loss=0.01271, audio_tagging_loss=0.008695, over 3053111.53 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:05:25,179 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2023-11-28 05:05:40,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3365913.3333333335, ans=0.2 2023-11-28 05:05:43,454 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.056e+01 8.747e+01 9.488e+01 1.023e+02 1.658e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 05:05:49,027 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504900 2023-11-28 05:05:51,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3365980.0, ans=0.2 2023-11-28 05:05:53,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3365980.0, ans=0.2 2023-11-28 05:05:56,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3365980.0, ans=0.2 2023-11-28 05:05:58,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3366046.6666666665, ans=0.1 2023-11-28 05:05:58,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3366046.6666666665, ans=0.1 2023-11-28 05:06:13,300 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.99 vs. limit=12.0 2023-11-28 05:06:17,056 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2023-11-28 05:06:20,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3366113.3333333335, ans=0.0 2023-11-28 05:06:23,045 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11950, loss[loss=0.06659, simple_loss=0.08858, pruned_loss=0.01317, audio_tagging_loss=0.009134, over 15808.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09117, pruned_loss=0.01279, audio_tagging_loss=0.008829, over 3059742.79 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:06:25,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3366180.0, ans=0.125 2023-11-28 05:06:26,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3366180.0, ans=0.125 2023-11-28 05:06:32,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3366180.0, ans=0.025 2023-11-28 05:06:34,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3366246.6666666665, ans=0.125 2023-11-28 05:06:36,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3366246.6666666665, ans=0.125 2023-11-28 05:06:40,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2023-11-28 05:06:46,936 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504950 2023-11-28 05:06:53,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3366313.3333333335, ans=0.125 2023-11-28 05:07:15,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3366446.6666666665, ans=0.0 2023-11-28 05:07:19,268 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 12000, loss[loss=0.04672, simple_loss=0.0632, pruned_loss=0.006287, audio_tagging_loss=0.008832, over 15044.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09116, pruned_loss=0.01286, audio_tagging_loss=0.00901, over 3062355.72 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 05:07:19,270 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 05:07:54,231 INFO [train_asr.py:1267] (0/4) Epoch 42, validation: loss=0.05822, simple_loss=0.05066, pruned_loss=0.005316, audio_tagging_loss=0.02757, over 4681554.00 frames. 2023-11-28 05:07:54,232 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 05:08:00,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3366513.3333333335, ans=0.09899494936611666 2023-11-28 05:08:11,282 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.786e+01 8.775e+01 9.473e+01 1.010e+02 1.187e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 05:08:13,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3366580.0, ans=0.125 2023-11-28 05:08:16,507 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505000 2023-11-28 05:08:20,749 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-42.pt 2023-11-28 05:08:35,715 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 0, loss[loss=0.06183, simple_loss=0.06578, pruned_loss=0.008623, audio_tagging_loss=0.02032, over 14159.00 frames. ], tot_loss[loss=0.06183, simple_loss=0.06578, pruned_loss=0.008623, audio_tagging_loss=0.02032, over 14159.00 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:08:35,722 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 05:08:50,690 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.7655, 4.6521, 4.3831, 4.4207], device='cuda:0') 2023-11-28 05:09:10,073 INFO [train_asr.py:1267] (0/4) Epoch 43, validation: loss=0.05773, simple_loss=0.0506, pruned_loss=0.005225, audio_tagging_loss=0.0272, over 4681554.00 frames. 2023-11-28 05:09:10,074 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 05:09:36,518 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.38 vs. limit=15.0 2023-11-28 05:09:38,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3366806.6666666665, ans=0.125 2023-11-28 05:09:49,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3366873.3333333335, ans=0.125 2023-11-28 05:10:04,081 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505050 2023-11-28 05:10:07,279 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 50, loss[loss=0.0857, simple_loss=0.1204, pruned_loss=0.01521, audio_tagging_loss=0.01028, over 16086.00 frames. ], tot_loss[loss=0.07575, simple_loss=0.09319, pruned_loss=0.01276, audio_tagging_loss=0.0164, over 692403.93 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:10:20,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3367073.3333333335, ans=0.125 2023-11-28 05:10:24,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3367073.3333333335, ans=0.125 2023-11-28 05:10:24,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3367073.3333333335, ans=0.0 2023-11-28 05:10:26,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3367073.3333333335, ans=0.0 2023-11-28 05:10:45,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3367206.6666666665, ans=0.2 2023-11-28 05:10:56,683 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.375e+01 9.586e+01 1.037e+02 1.129e+02 1.417e+02, threshold=2.074e+02, percent-clipped=0.0 2023-11-28 05:11:01,197 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505100 2023-11-28 05:11:04,372 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 100, loss[loss=0.06445, simple_loss=0.08175, pruned_loss=0.007591, audio_tagging_loss=0.01598, over 15091.00 frames. ], tot_loss[loss=0.07432, simple_loss=0.09189, pruned_loss=0.01234, audio_tagging_loss=0.01603, over 1214021.82 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:11:25,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3367406.6666666665, ans=0.2 2023-11-28 05:11:43,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3367540.0, ans=0.1 2023-11-28 05:11:50,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3367606.6666666665, ans=0.1 2023-11-28 05:11:52,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3367606.6666666665, ans=0.0 2023-11-28 05:11:58,743 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505150 2023-11-28 05:12:02,507 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 150, loss[loss=0.05714, simple_loss=0.07892, pruned_loss=0.007328, audio_tagging_loss=0.01035, over 15192.00 frames. ], tot_loss[loss=0.07224, simple_loss=0.09072, pruned_loss=0.01243, audio_tagging_loss=0.01445, over 1617126.88 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:12:18,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3367740.0, ans=0.1 2023-11-28 05:12:50,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3367940.0, ans=0.125 2023-11-28 05:12:52,804 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 9.055e+01 9.611e+01 1.032e+02 1.243e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 05:12:57,263 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505200 2023-11-28 05:13:01,134 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 200, loss[loss=0.0692, simple_loss=0.08955, pruned_loss=0.01356, audio_tagging_loss=0.01086, over 14868.00 frames. ], tot_loss[loss=0.07118, simple_loss=0.09202, pruned_loss=0.01257, audio_tagging_loss=0.0126, over 1936448.45 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:13:10,587 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.71 vs. limit=10.0 2023-11-28 05:13:18,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3368073.3333333335, ans=0.125 2023-11-28 05:13:19,944 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=22.5 2023-11-28 05:13:20,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3368073.3333333335, ans=0.0 2023-11-28 05:13:39,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3368206.6666666665, ans=0.2 2023-11-28 05:13:54,406 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505250 2023-11-28 05:13:57,698 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 250, loss[loss=0.05598, simple_loss=0.07597, pruned_loss=0.007546, audio_tagging_loss=0.01045, over 15627.00 frames. ], tot_loss[loss=0.0698, simple_loss=0.09154, pruned_loss=0.01265, audio_tagging_loss=0.01138, over 2178262.26 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 8.0 2023-11-28 05:13:57,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3368340.0, ans=0.0 2023-11-28 05:14:21,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3368473.3333333335, ans=0.2 2023-11-28 05:14:23,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3368473.3333333335, ans=0.125 2023-11-28 05:14:32,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3368540.0, ans=0.04949747468305833 2023-11-28 05:14:48,347 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.828e+01 9.024e+01 9.625e+01 1.027e+02 1.223e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 05:14:51,681 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505300 2023-11-28 05:14:55,470 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 300, loss[loss=0.06453, simple_loss=0.08801, pruned_loss=0.01173, audio_tagging_loss=0.008796, over 13920.00 frames. ], tot_loss[loss=0.06797, simple_loss=0.0899, pruned_loss=0.01233, audio_tagging_loss=0.01069, over 2375040.20 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 8.0 2023-11-28 05:15:02,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3368673.3333333335, ans=0.07 2023-11-28 05:15:22,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3368806.6666666665, ans=0.125 2023-11-28 05:15:40,365 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:15:40,514 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2023-11-28 05:15:45,148 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:15:49,203 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505350 2023-11-28 05:15:52,969 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 350, loss[loss=0.07716, simple_loss=0.1076, pruned_loss=0.01414, audio_tagging_loss=0.009203, over 15349.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.08915, pruned_loss=0.01216, audio_tagging_loss=0.01023, over 2519084.67 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 8.0 2023-11-28 05:16:15,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3369140.0, ans=0.2 2023-11-28 05:16:26,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3369206.6666666665, ans=0.0 2023-11-28 05:16:34,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3369206.6666666665, ans=0.125 2023-11-28 05:16:42,846 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.904e+01 9.500e+01 1.023e+02 1.547e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 05:16:46,293 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505400 2023-11-28 05:16:49,838 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 400, loss[loss=0.06003, simple_loss=0.0794, pruned_loss=0.01139, audio_tagging_loss=0.008947, over 16110.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.08909, pruned_loss=0.01211, audio_tagging_loss=0.00981, over 2633545.19 frames. ], batch size: 62, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:17:06,617 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=15.0 2023-11-28 05:17:10,719 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2023-11-28 05:17:33,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3369540.0, ans=0.0 2023-11-28 05:17:41,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3369606.6666666665, ans=0.125 2023-11-28 05:17:43,286 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505450 2023-11-28 05:17:46,380 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 450, loss[loss=0.06462, simple_loss=0.08851, pruned_loss=0.01195, audio_tagging_loss=0.008412, over 14786.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.0887, pruned_loss=0.01196, audio_tagging_loss=0.009642, over 2725978.30 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:17:52,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3369673.3333333335, ans=0.05 2023-11-28 05:18:06,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3369740.0, ans=0.1 2023-11-28 05:18:16,749 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:18:26,494 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2023-11-28 05:18:37,194 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.908e+01 8.667e+01 9.242e+01 1.003e+02 1.378e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-28 05:18:37,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3369940.0, ans=0.125 2023-11-28 05:18:41,122 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505500 2023-11-28 05:18:44,347 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 500, loss[loss=0.0705, simple_loss=0.09839, pruned_loss=0.01199, audio_tagging_loss=0.009316, over 15161.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08761, pruned_loss=0.0121, audio_tagging_loss=0.009476, over 2800826.49 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:19:11,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3370140.0, ans=0.2 2023-11-28 05:19:12,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3370140.0, ans=0.07 2023-11-28 05:19:23,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3370206.6666666665, ans=0.125 2023-11-28 05:19:38,575 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505550 2023-11-28 05:19:41,716 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 550, loss[loss=0.04111, simple_loss=0.04637, pruned_loss=0.006073, audio_tagging_loss=0.01185, over 14105.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.08942, pruned_loss=0.01249, audio_tagging_loss=0.009171, over 2853246.42 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:19:52,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3370406.6666666665, ans=0.125 2023-11-28 05:19:53,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3370406.6666666665, ans=0.1 2023-11-28 05:20:19,079 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.70 vs. limit=6.0 2023-11-28 05:20:22,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.77 vs. limit=15.0 2023-11-28 05:20:28,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3370606.6666666665, ans=0.2 2023-11-28 05:20:31,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3370606.6666666665, ans=0.125 2023-11-28 05:20:32,176 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 9.076e+01 9.606e+01 1.009e+02 1.464e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 05:20:32,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3370606.6666666665, ans=0.0 2023-11-28 05:20:34,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3370606.6666666665, ans=0.125 2023-11-28 05:20:36,133 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505600 2023-11-28 05:20:39,636 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 600, loss[loss=0.07353, simple_loss=0.1003, pruned_loss=0.0161, audio_tagging_loss=0.00729, over 15846.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.08955, pruned_loss=0.01264, audio_tagging_loss=0.009117, over 2902035.96 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:20:43,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3370673.3333333335, ans=0.125 2023-11-28 05:20:55,610 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.77 vs. limit=15.0 2023-11-28 05:20:59,856 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=15.0 2023-11-28 05:21:01,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3370806.6666666665, ans=0.125 2023-11-28 05:21:10,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3370806.6666666665, ans=0.125 2023-11-28 05:21:19,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3370873.3333333335, ans=0.1 2023-11-28 05:21:31,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3370940.0, ans=0.125 2023-11-28 05:21:34,471 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505650 2023-11-28 05:21:37,656 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 650, loss[loss=0.07097, simple_loss=0.09947, pruned_loss=0.01406, audio_tagging_loss=0.007176, over 14758.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.08967, pruned_loss=0.01258, audio_tagging_loss=0.009122, over 2934210.62 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:22:28,539 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.551e+01 8.720e+01 9.285e+01 9.863e+01 1.198e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 05:22:31,980 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505700 2023-11-28 05:22:35,222 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 700, loss[loss=0.0674, simple_loss=0.08895, pruned_loss=0.01638, audio_tagging_loss=0.006546, over 14759.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08938, pruned_loss=0.01248, audio_tagging_loss=0.009085, over 2953569.25 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:22:40,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3371340.0, ans=0.2 2023-11-28 05:22:46,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3371406.6666666665, ans=0.125 2023-11-28 05:23:30,039 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505750 2023-11-28 05:23:32,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3371673.3333333335, ans=0.2 2023-11-28 05:23:33,239 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 750, loss[loss=0.0572, simple_loss=0.08122, pruned_loss=0.008021, audio_tagging_loss=0.008573, over 14126.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.0897, pruned_loss=0.01244, audio_tagging_loss=0.009053, over 2970769.52 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:23:33,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3371673.3333333335, ans=0.125 2023-11-28 05:23:44,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.44 vs. limit=22.5 2023-11-28 05:23:45,247 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.80 vs. limit=15.0 2023-11-28 05:23:53,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3371740.0, ans=0.09899494936611666 2023-11-28 05:23:59,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.85 vs. limit=10.0 2023-11-28 05:23:59,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3371806.6666666665, ans=0.125 2023-11-28 05:23:59,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3371806.6666666665, ans=0.1 2023-11-28 05:24:04,514 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.88 vs. limit=15.0 2023-11-28 05:24:06,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3371873.3333333335, ans=0.125 2023-11-28 05:24:15,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3371873.3333333335, ans=0.125 2023-11-28 05:24:24,147 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 8.959e+01 9.414e+01 9.993e+01 1.273e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 05:24:25,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3371940.0, ans=0.2 2023-11-28 05:24:27,576 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505800 2023-11-28 05:24:31,330 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 800, loss[loss=0.06498, simple_loss=0.08421, pruned_loss=0.01354, audio_tagging_loss=0.009347, over 14427.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08971, pruned_loss=0.01246, audio_tagging_loss=0.00901, over 2988247.24 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:24:38,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3372006.6666666665, ans=0.125 2023-11-28 05:24:40,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3372006.6666666665, ans=0.125 2023-11-28 05:24:43,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3372073.3333333335, ans=0.125 2023-11-28 05:24:46,153 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2023-11-28 05:24:46,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3372073.3333333335, ans=0.125 2023-11-28 05:24:54,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3372140.0, ans=0.2 2023-11-28 05:25:05,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3372206.6666666665, ans=0.125 2023-11-28 05:25:24,819 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505850 2023-11-28 05:25:28,130 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 850, loss[loss=0.05232, simple_loss=0.06743, pruned_loss=0.009111, audio_tagging_loss=0.009496, over 14830.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08933, pruned_loss=0.01224, audio_tagging_loss=0.00912, over 3004074.93 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:25:36,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3372340.0, ans=0.125 2023-11-28 05:25:55,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3372473.3333333335, ans=22.5 2023-11-28 05:25:59,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3372473.3333333335, ans=0.125 2023-11-28 05:26:12,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3372540.0, ans=0.0 2023-11-28 05:26:18,489 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.788e+01 8.784e+01 9.411e+01 9.995e+01 2.932e+02, threshold=1.882e+02, percent-clipped=1.0 2023-11-28 05:26:21,836 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505900 2023-11-28 05:26:26,160 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 900, loss[loss=0.082, simple_loss=0.1131, pruned_loss=0.01938, audio_tagging_loss=0.006071, over 15254.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08959, pruned_loss=0.01227, audio_tagging_loss=0.009084, over 3013960.60 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:26:27,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3372673.3333333335, ans=0.125 2023-11-28 05:26:27,903 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.20 vs. limit=12.0 2023-11-28 05:26:28,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3372673.3333333335, ans=0.0 2023-11-28 05:26:30,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3372673.3333333335, ans=0.1 2023-11-28 05:26:42,567 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:27:01,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3372873.3333333335, ans=0.1 2023-11-28 05:27:19,708 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505950 2023-11-28 05:27:23,370 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 950, loss[loss=0.07355, simple_loss=0.1038, pruned_loss=0.01432, audio_tagging_loss=0.007349, over 14947.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08935, pruned_loss=0.01226, audio_tagging_loss=0.009027, over 3022454.81 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:27:23,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3373006.6666666665, ans=0.125 2023-11-28 05:27:32,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3373006.6666666665, ans=0.09899494936611666 2023-11-28 05:27:37,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3373073.3333333335, ans=0.025 2023-11-28 05:27:45,902 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.31 vs. limit=12.0 2023-11-28 05:27:51,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3373140.0, ans=0.125 2023-11-28 05:27:57,824 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:28:06,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3373206.6666666665, ans=0.125 2023-11-28 05:28:14,049 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.684e+01 9.471e+01 1.027e+02 1.244e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 05:28:17,491 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506000 2023-11-28 05:28:20,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3373340.0, ans=0.07 2023-11-28 05:28:21,353 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1000, loss[loss=0.06269, simple_loss=0.08265, pruned_loss=0.01132, audio_tagging_loss=0.01005, over 15647.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08859, pruned_loss=0.01229, audio_tagging_loss=0.008955, over 3024334.44 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:28:44,512 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:28:47,927 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 05:28:57,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3373540.0, ans=0.5 2023-11-28 05:28:59,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3373540.0, ans=0.125 2023-11-28 05:29:14,897 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506050 2023-11-28 05:29:18,171 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1050, loss[loss=0.06396, simple_loss=0.08287, pruned_loss=0.01491, audio_tagging_loss=0.007613, over 14271.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08931, pruned_loss=0.01236, audio_tagging_loss=0.008776, over 3026845.51 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:29:26,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3373673.3333333335, ans=0.0 2023-11-28 05:29:26,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3373673.3333333335, ans=0.125 2023-11-28 05:29:36,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3373740.0, ans=0.125 2023-11-28 05:29:41,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3373806.6666666665, ans=0.125 2023-11-28 05:29:51,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3373806.6666666665, ans=0.2 2023-11-28 05:29:52,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3373873.3333333335, ans=0.0 2023-11-28 05:29:58,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3373873.3333333335, ans=0.125 2023-11-28 05:29:58,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3373873.3333333335, ans=0.125 2023-11-28 05:30:09,118 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.815e+01 8.822e+01 9.430e+01 1.008e+02 1.221e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 05:30:13,102 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506100 2023-11-28 05:30:13,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3373940.0, ans=0.0 2023-11-28 05:30:14,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3373940.0, ans=0.125 2023-11-28 05:30:16,827 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1100, loss[loss=0.04758, simple_loss=0.06057, pruned_loss=0.008577, audio_tagging_loss=0.008712, over 15252.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08934, pruned_loss=0.01236, audio_tagging_loss=0.008635, over 3028730.50 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:30:18,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3374006.6666666665, ans=0.05 2023-11-28 05:30:18,447 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.01 vs. limit=15.0 2023-11-28 05:30:21,746 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 05:30:26,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3374006.6666666665, ans=0.125 2023-11-28 05:30:28,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3374073.3333333335, ans=0.1 2023-11-28 05:30:45,663 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.50 vs. limit=15.0 2023-11-28 05:30:46,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.47 vs. limit=12.0 2023-11-28 05:31:00,492 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.62 vs. limit=15.0 2023-11-28 05:31:10,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2023-11-28 05:31:11,340 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506150 2023-11-28 05:31:14,624 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1150, loss[loss=0.07919, simple_loss=0.09749, pruned_loss=0.02289, audio_tagging_loss=0.007557, over 14684.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08957, pruned_loss=0.01236, audio_tagging_loss=0.008643, over 3029920.32 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:31:20,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3374340.0, ans=10.0 2023-11-28 05:31:24,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3374406.6666666665, ans=0.125 2023-11-28 05:31:35,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3374406.6666666665, ans=0.125 2023-11-28 05:31:40,492 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.19 vs. limit=12.0 2023-11-28 05:31:56,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3374540.0, ans=0.125 2023-11-28 05:32:06,086 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 8.704e+01 9.429e+01 9.950e+01 1.461e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 05:32:08,340 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506200 2023-11-28 05:32:11,948 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1200, loss[loss=0.06763, simple_loss=0.1014, pruned_loss=0.01171, audio_tagging_loss=0.005208, over 14881.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08857, pruned_loss=0.01221, audio_tagging_loss=0.008693, over 3022006.49 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:32:18,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3374673.3333333335, ans=0.04949747468305833 2023-11-28 05:32:20,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3374673.3333333335, ans=0.0 2023-11-28 05:32:27,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3374740.0, ans=0.125 2023-11-28 05:32:31,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3374740.0, ans=0.125 2023-11-28 05:33:05,784 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506250 2023-11-28 05:33:09,515 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1250, loss[loss=0.06859, simple_loss=0.09604, pruned_loss=0.01043, audio_tagging_loss=0.01014, over 15546.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08813, pruned_loss=0.01203, audio_tagging_loss=0.008649, over 3027617.59 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:33:22,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3375073.3333333335, ans=0.125 2023-11-28 05:33:23,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3375073.3333333335, ans=0.0 2023-11-28 05:33:29,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3375073.3333333335, ans=0.125 2023-11-28 05:34:02,428 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.743e+01 8.863e+01 9.431e+01 1.030e+02 1.303e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 05:34:04,694 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506300 2023-11-28 05:34:07,949 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1300, loss[loss=0.05869, simple_loss=0.07656, pruned_loss=0.008981, audio_tagging_loss=0.01143, over 15984.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08812, pruned_loss=0.01212, audio_tagging_loss=0.008695, over 3026755.71 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:34:12,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3375340.0, ans=0.0 2023-11-28 05:34:21,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3375406.6666666665, ans=0.0 2023-11-28 05:34:24,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.81 vs. limit=15.0 2023-11-28 05:34:39,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3375473.3333333335, ans=0.125 2023-11-28 05:34:47,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3375540.0, ans=0.125 2023-11-28 05:34:48,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3375540.0, ans=0.95 2023-11-28 05:34:50,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3375540.0, ans=0.0 2023-11-28 05:34:56,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3375606.6666666665, ans=0.0 2023-11-28 05:35:01,562 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506350 2023-11-28 05:35:04,842 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1350, loss[loss=0.05512, simple_loss=0.07195, pruned_loss=0.009672, audio_tagging_loss=0.009467, over 15702.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08824, pruned_loss=0.0121, audio_tagging_loss=0.008734, over 3033030.38 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:35:09,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3375673.3333333335, ans=0.0 2023-11-28 05:35:16,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3375740.0, ans=0.125 2023-11-28 05:35:39,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3375873.3333333335, ans=0.0 2023-11-28 05:35:48,615 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 05:35:50,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3375940.0, ans=0.125 2023-11-28 05:35:57,891 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.640e+01 9.329e+01 1.009e+02 1.189e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-28 05:35:59,120 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506400 2023-11-28 05:36:02,706 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1400, loss[loss=0.06792, simple_loss=0.09816, pruned_loss=0.009584, audio_tagging_loss=0.009259, over 14534.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08833, pruned_loss=0.01202, audio_tagging_loss=0.008727, over 3032602.33 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:36:06,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3376006.6666666665, ans=0.125 2023-11-28 05:36:14,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=15.0 2023-11-28 05:36:22,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3376073.3333333335, ans=0.0 2023-11-28 05:36:27,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3376140.0, ans=0.125 2023-11-28 05:36:36,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3376206.6666666665, ans=0.1 2023-11-28 05:36:54,898 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2023-11-28 05:36:58,350 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506450 2023-11-28 05:37:01,513 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1450, loss[loss=0.08177, simple_loss=0.1098, pruned_loss=0.01773, audio_tagging_loss=0.009155, over 14809.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08877, pruned_loss=0.01208, audio_tagging_loss=0.008874, over 3036215.73 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:37:07,183 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:37:48,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3376606.6666666665, ans=0.1 2023-11-28 05:37:53,730 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 8.627e+01 9.329e+01 1.021e+02 1.483e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-28 05:37:54,905 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506500 2023-11-28 05:37:58,155 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1500, loss[loss=0.06456, simple_loss=0.07612, pruned_loss=0.01301, audio_tagging_loss=0.01349, over 16528.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08849, pruned_loss=0.0122, audio_tagging_loss=0.008897, over 3036744.39 frames. ], batch size: 65, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:38:04,352 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=22.5 2023-11-28 05:38:26,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3376806.6666666665, ans=0.2 2023-11-28 05:38:33,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3376873.3333333335, ans=0.2 2023-11-28 05:38:45,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3376940.0, ans=0.125 2023-11-28 05:38:48,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3376940.0, ans=0.125 2023-11-28 05:38:52,910 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506550 2023-11-28 05:38:56,169 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1550, loss[loss=0.06068, simple_loss=0.07866, pruned_loss=0.01138, audio_tagging_loss=0.00997, over 14487.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08804, pruned_loss=0.01219, audio_tagging_loss=0.00897, over 3038688.19 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:39:03,768 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2023-11-28 05:39:42,816 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-28 05:39:49,865 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.333e+01 9.028e+01 9.506e+01 1.021e+02 1.396e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 05:39:51,051 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506600 2023-11-28 05:39:54,701 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1600, loss[loss=0.07037, simple_loss=0.08701, pruned_loss=0.01509, audio_tagging_loss=0.01178, over 15143.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08826, pruned_loss=0.0122, audio_tagging_loss=0.009004, over 3037296.51 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:39:55,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3377340.0, ans=0.125 2023-11-28 05:40:02,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3377340.0, ans=0.125 2023-11-28 05:40:49,124 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506650 2023-11-28 05:40:49,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3377606.6666666665, ans=0.5 2023-11-28 05:40:52,366 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1650, loss[loss=0.08653, simple_loss=0.1223, pruned_loss=0.01787, audio_tagging_loss=0.007514, over 15000.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08815, pruned_loss=0.01215, audio_tagging_loss=0.009024, over 3041490.25 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:41:01,937 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.86 vs. limit=6.0 2023-11-28 05:41:03,693 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:41:05,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3377740.0, ans=0.1 2023-11-28 05:41:10,949 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:41:13,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3377740.0, ans=0.125 2023-11-28 05:41:24,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3377806.6666666665, ans=0.125 2023-11-28 05:41:46,316 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.739e+01 8.798e+01 9.580e+01 1.024e+02 1.381e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-28 05:41:46,408 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506700 2023-11-28 05:41:50,516 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1700, loss[loss=0.05904, simple_loss=0.08029, pruned_loss=0.007703, audio_tagging_loss=0.01119, over 15510.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08747, pruned_loss=0.01201, audio_tagging_loss=0.009034, over 3046005.15 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:41:51,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3378006.6666666665, ans=0.0 2023-11-28 05:42:05,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3378073.3333333335, ans=0.125 2023-11-28 05:42:08,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3378073.3333333335, ans=0.2 2023-11-28 05:42:27,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2023-11-28 05:42:32,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3378206.6666666665, ans=0.125 2023-11-28 05:42:39,118 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2023-11-28 05:42:44,599 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506750 2023-11-28 05:42:47,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3378340.0, ans=0.125 2023-11-28 05:42:47,829 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.05 vs. limit=15.0 2023-11-28 05:42:48,370 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1750, loss[loss=0.07188, simple_loss=0.09921, pruned_loss=0.01419, audio_tagging_loss=0.008082, over 14646.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08853, pruned_loss=0.01218, audio_tagging_loss=0.008882, over 3047713.03 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:42:50,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3378340.0, ans=0.1 2023-11-28 05:43:19,558 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2023-11-28 05:43:22,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3378540.0, ans=0.0 2023-11-28 05:43:30,593 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-11-28 05:43:41,934 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.850e+01 8.622e+01 9.287e+01 9.848e+01 1.344e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 05:43:42,034 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506800 2023-11-28 05:43:45,539 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1800, loss[loss=0.04778, simple_loss=0.05957, pruned_loss=0.009646, audio_tagging_loss=0.008348, over 14887.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.0892, pruned_loss=0.01235, audio_tagging_loss=0.008737, over 3044020.81 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:44:03,834 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2023-11-28 05:44:04,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3378740.0, ans=0.125 2023-11-28 05:44:06,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3378740.0, ans=0.1 2023-11-28 05:44:18,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3378806.6666666665, ans=0.1 2023-11-28 05:44:39,512 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506850 2023-11-28 05:44:43,410 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1850, loss[loss=0.05218, simple_loss=0.06777, pruned_loss=0.008141, audio_tagging_loss=0.01015, over 15290.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09001, pruned_loss=0.01231, audio_tagging_loss=0.008652, over 3044819.31 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:44:52,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3379006.6666666665, ans=0.125 2023-11-28 05:44:53,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3379073.3333333335, ans=0.0 2023-11-28 05:45:13,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3379140.0, ans=0.04949747468305833 2023-11-28 05:45:37,548 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 8.914e+01 9.536e+01 1.008e+02 1.259e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 05:45:37,655 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506900 2023-11-28 05:45:41,363 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1900, loss[loss=0.06591, simple_loss=0.08844, pruned_loss=0.009698, audio_tagging_loss=0.01199, over 15322.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08961, pruned_loss=0.01232, audio_tagging_loss=0.008683, over 3040608.84 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:45:57,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3379406.6666666665, ans=0.0 2023-11-28 05:45:59,373 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2023-11-28 05:46:23,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3379540.0, ans=0.125 2023-11-28 05:46:26,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3379606.6666666665, ans=0.125 2023-11-28 05:46:35,370 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506950 2023-11-28 05:46:38,583 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1950, loss[loss=0.08456, simple_loss=0.116, pruned_loss=0.0192, audio_tagging_loss=0.00737, over 14930.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08865, pruned_loss=0.01221, audio_tagging_loss=0.008715, over 3038798.29 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:46:41,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3379673.3333333335, ans=0.125 2023-11-28 05:46:46,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3379673.3333333335, ans=0.1 2023-11-28 05:46:48,846 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.51 vs. limit=12.0 2023-11-28 05:47:33,351 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 8.861e+01 9.415e+01 1.012e+02 1.225e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 05:47:33,449 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507000 2023-11-28 05:47:33,957 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.72 vs. limit=15.0 2023-11-28 05:47:36,948 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2000, loss[loss=0.05851, simple_loss=0.073, pruned_loss=0.01269, audio_tagging_loss=0.009322, over 14081.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08922, pruned_loss=0.01239, audio_tagging_loss=0.008675, over 3040011.53 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:47:45,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3380006.6666666665, ans=0.125 2023-11-28 05:47:56,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.84 vs. limit=10.0 2023-11-28 05:47:57,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3380073.3333333335, ans=0.125 2023-11-28 05:48:01,765 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2023-11-28 05:48:31,205 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507050 2023-11-28 05:48:33,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3380340.0, ans=0.0 2023-11-28 05:48:34,887 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2050, loss[loss=0.08535, simple_loss=0.1131, pruned_loss=0.01827, audio_tagging_loss=0.01052, over 16617.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08996, pruned_loss=0.01266, audio_tagging_loss=0.008615, over 3040709.57 frames. ], batch size: 61, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:48:41,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3380340.0, ans=0.0 2023-11-28 05:49:00,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3380473.3333333335, ans=0.1 2023-11-28 05:49:15,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3380540.0, ans=0.125 2023-11-28 05:49:29,189 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507100 2023-11-28 05:49:30,216 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 9.113e+01 9.631e+01 1.014e+02 1.250e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-28 05:49:32,372 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2100, loss[loss=0.08542, simple_loss=0.1127, pruned_loss=0.02141, audio_tagging_loss=0.007681, over 15446.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08944, pruned_loss=0.0125, audio_tagging_loss=0.008596, over 3040602.76 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:49:35,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3380673.3333333335, ans=0.0 2023-11-28 05:49:45,161 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2023-11-28 05:49:59,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.81 vs. limit=6.0 2023-11-28 05:50:11,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3380873.3333333335, ans=0.05 2023-11-28 05:50:17,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3380940.0, ans=0.2 2023-11-28 05:50:26,444 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507150 2023-11-28 05:50:29,589 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2150, loss[loss=0.06224, simple_loss=0.09636, pruned_loss=0.008392, audio_tagging_loss=0.005665, over 15325.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09063, pruned_loss=0.01265, audio_tagging_loss=0.008451, over 3037838.38 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:50:51,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3381073.3333333335, ans=0.0 2023-11-28 05:50:57,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3381140.0, ans=0.125 2023-11-28 05:51:05,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3381206.6666666665, ans=0.125 2023-11-28 05:51:07,329 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 05:51:18,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3381273.3333333335, ans=0.0 2023-11-28 05:51:25,091 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507200 2023-11-28 05:51:26,068 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.293e+01 8.657e+01 9.306e+01 1.016e+02 1.700e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-28 05:51:28,713 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2200, loss[loss=0.05543, simple_loss=0.06583, pruned_loss=0.01252, audio_tagging_loss=0.01, over 14569.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09044, pruned_loss=0.01268, audio_tagging_loss=0.008545, over 3034724.04 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:51:35,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3381340.0, ans=0.125 2023-11-28 05:51:49,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3381406.6666666665, ans=0.0 2023-11-28 05:51:54,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3381473.3333333335, ans=0.0 2023-11-28 05:52:13,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3381540.0, ans=0.125 2023-11-28 05:52:23,791 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507250 2023-11-28 05:52:23,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3381606.6666666665, ans=0.125 2023-11-28 05:52:27,085 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2250, loss[loss=0.07532, simple_loss=0.1142, pruned_loss=0.01309, audio_tagging_loss=0.005115, over 16572.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09072, pruned_loss=0.01264, audio_tagging_loss=0.008599, over 3030779.89 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:52:44,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2023-11-28 05:52:44,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.04 vs. limit=6.0 2023-11-28 05:53:21,029 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507300 2023-11-28 05:53:22,034 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.780e+01 8.876e+01 9.357e+01 9.943e+01 1.279e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-28 05:53:24,249 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2300, loss[loss=0.07669, simple_loss=0.1114, pruned_loss=0.01279, audio_tagging_loss=0.008195, over 14977.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09105, pruned_loss=0.01265, audio_tagging_loss=0.008667, over 3038089.62 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:53:46,711 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.80 vs. limit=15.0 2023-11-28 05:53:47,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3382140.0, ans=0.0 2023-11-28 05:53:47,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3382140.0, ans=0.125 2023-11-28 05:53:54,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3382140.0, ans=0.125 2023-11-28 05:54:04,532 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.03 vs. limit=15.0 2023-11-28 05:54:17,417 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 05:54:17,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3382273.3333333335, ans=0.0 2023-11-28 05:54:18,561 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507350 2023-11-28 05:54:21,706 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2350, loss[loss=0.07644, simple_loss=0.1128, pruned_loss=0.01416, audio_tagging_loss=0.005866, over 16087.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09027, pruned_loss=0.0125, audio_tagging_loss=0.008778, over 3037660.91 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:54:24,091 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2023-11-28 05:54:30,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3382340.0, ans=0.2 2023-11-28 05:55:01,396 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.67 vs. limit=15.0 2023-11-28 05:55:02,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3382540.0, ans=0.125 2023-11-28 05:55:17,773 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507400 2023-11-28 05:55:18,814 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.819e+01 9.502e+01 1.018e+02 1.349e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 05:55:21,461 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2400, loss[loss=0.05406, simple_loss=0.07018, pruned_loss=0.009289, audio_tagging_loss=0.009675, over 15036.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.08989, pruned_loss=0.01257, audio_tagging_loss=0.008962, over 3035888.20 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:55:33,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3382740.0, ans=0.125 2023-11-28 05:55:37,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=15.0 2023-11-28 05:56:05,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3382940.0, ans=0.0 2023-11-28 05:56:14,616 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507450 2023-11-28 05:56:17,870 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2450, loss[loss=0.05988, simple_loss=0.09038, pruned_loss=0.008008, audio_tagging_loss=0.006687, over 14719.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08886, pruned_loss=0.01222, audio_tagging_loss=0.009023, over 3028975.51 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:56:25,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3383006.6666666665, ans=0.125 2023-11-28 05:56:36,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3383073.3333333335, ans=15.0 2023-11-28 05:56:52,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3383206.6666666665, ans=0.0 2023-11-28 05:56:53,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3383206.6666666665, ans=0.1 2023-11-28 05:57:12,409 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507500 2023-11-28 05:57:13,380 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.778e+01 9.508e+01 1.025e+02 1.201e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 05:57:13,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3383273.3333333335, ans=0.0 2023-11-28 05:57:15,561 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2500, loss[loss=0.06612, simple_loss=0.09453, pruned_loss=0.01274, audio_tagging_loss=0.006123, over 15027.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08909, pruned_loss=0.01221, audio_tagging_loss=0.009028, over 3038648.43 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:57:18,556 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.73 vs. limit=6.0 2023-11-28 05:57:58,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3383540.0, ans=0.07 2023-11-28 05:57:59,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3383540.0, ans=0.2 2023-11-28 05:58:10,289 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507550 2023-11-28 05:58:10,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3383606.6666666665, ans=0.2 2023-11-28 05:58:12,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3383606.6666666665, ans=0.0 2023-11-28 05:58:13,488 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=15.0 2023-11-28 05:58:14,144 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2550, loss[loss=0.06791, simple_loss=0.0895, pruned_loss=0.01362, audio_tagging_loss=0.009534, over 15129.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.0903, pruned_loss=0.01258, audio_tagging_loss=0.008865, over 3041178.55 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:58:22,488 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.58 vs. limit=22.5 2023-11-28 05:58:28,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3383740.0, ans=0.0 2023-11-28 05:58:48,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3383873.3333333335, ans=0.125 2023-11-28 05:58:53,880 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.65 vs. limit=12.0 2023-11-28 05:58:55,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3383873.3333333335, ans=0.125 2023-11-28 05:59:03,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3383940.0, ans=0.125 2023-11-28 05:59:05,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3383940.0, ans=0.125 2023-11-28 05:59:08,249 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507600 2023-11-28 05:59:09,222 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.936e+01 8.563e+01 9.261e+01 9.725e+01 1.208e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-28 05:59:11,674 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2600, loss[loss=0.0623, simple_loss=0.07334, pruned_loss=0.01585, audio_tagging_loss=0.009779, over 13686.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08934, pruned_loss=0.01243, audio_tagging_loss=0.00878, over 3040380.28 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:59:15,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3384006.6666666665, ans=0.2 2023-11-28 05:59:30,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3384073.3333333335, ans=0.125 2023-11-28 05:59:37,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3384140.0, ans=0.125 2023-11-28 05:59:41,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3384140.0, ans=0.0 2023-11-28 05:59:45,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3384140.0, ans=15.0 2023-11-28 05:59:47,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3384206.6666666665, ans=0.125 2023-11-28 05:59:59,717 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.62 vs. limit=10.0 2023-11-28 06:00:01,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3384273.3333333335, ans=0.125 2023-11-28 06:00:05,554 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507650 2023-11-28 06:00:09,386 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2650, loss[loss=0.07294, simple_loss=0.1039, pruned_loss=0.01375, audio_tagging_loss=0.007264, over 15859.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09063, pruned_loss=0.01255, audio_tagging_loss=0.008678, over 3041165.97 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:00:13,931 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:00:18,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3384340.0, ans=0.125 2023-11-28 06:00:24,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3384406.6666666665, ans=0.125 2023-11-28 06:00:30,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3384406.6666666665, ans=0.125 2023-11-28 06:00:32,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3384473.3333333335, ans=0.5 2023-11-28 06:00:57,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3384606.6666666665, ans=0.125 2023-11-28 06:01:03,490 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507700 2023-11-28 06:01:04,925 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 8.714e+01 9.424e+01 1.027e+02 1.447e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-28 06:01:07,161 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2700, loss[loss=0.06043, simple_loss=0.07914, pruned_loss=0.01241, audio_tagging_loss=0.008446, over 14721.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09018, pruned_loss=0.01257, audio_tagging_loss=0.008565, over 3044986.56 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:01:16,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3384673.3333333335, ans=0.125 2023-11-28 06:01:18,149 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.49 vs. limit=22.5 2023-11-28 06:01:18,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3384740.0, ans=0.0 2023-11-28 06:01:38,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3384806.6666666665, ans=0.0 2023-11-28 06:01:38,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=22.5 2023-11-28 06:01:41,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3384873.3333333335, ans=0.0 2023-11-28 06:01:42,117 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.80 vs. limit=22.5 2023-11-28 06:02:01,557 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507750 2023-11-28 06:02:04,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3385006.6666666665, ans=0.1 2023-11-28 06:02:04,880 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2750, loss[loss=0.05861, simple_loss=0.07024, pruned_loss=0.01011, audio_tagging_loss=0.01338, over 14443.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09034, pruned_loss=0.01243, audio_tagging_loss=0.008597, over 3036181.45 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:02:06,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3385006.6666666665, ans=0.07 2023-11-28 06:02:26,755 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.72 vs. limit=15.0 2023-11-28 06:02:29,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.18 vs. limit=15.0 2023-11-28 06:02:38,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3385206.6666666665, ans=0.95 2023-11-28 06:02:39,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3385206.6666666665, ans=0.125 2023-11-28 06:02:48,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=12.0 2023-11-28 06:02:48,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3385206.6666666665, ans=0.0 2023-11-28 06:02:57,447 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:02:58,591 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507800 2023-11-28 06:02:59,613 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.045e+01 8.812e+01 9.371e+01 1.002e+02 1.155e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-28 06:03:00,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3385273.3333333335, ans=0.0 2023-11-28 06:03:02,361 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2800, loss[loss=0.04935, simple_loss=0.06388, pruned_loss=0.007959, audio_tagging_loss=0.009453, over 15924.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08947, pruned_loss=0.0121, audio_tagging_loss=0.008555, over 3033333.03 frames. ], batch size: 63, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:03:18,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3385406.6666666665, ans=0.0 2023-11-28 06:03:38,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.17 vs. limit=12.0 2023-11-28 06:03:56,791 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507850 2023-11-28 06:04:00,424 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2850, loss[loss=0.06118, simple_loss=0.08246, pruned_loss=0.01185, audio_tagging_loss=0.008105, over 15473.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09009, pruned_loss=0.01231, audio_tagging_loss=0.008498, over 3039518.32 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:04:01,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=3385673.3333333335, ans=0.02 2023-11-28 06:04:08,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3385673.3333333335, ans=0.07 2023-11-28 06:04:29,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3385806.6666666665, ans=0.125 2023-11-28 06:04:34,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3385873.3333333335, ans=0.0 2023-11-28 06:04:45,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3385940.0, ans=0.07 2023-11-28 06:04:46,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3385940.0, ans=0.0 2023-11-28 06:04:46,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3385940.0, ans=0.1 2023-11-28 06:04:54,049 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507900 2023-11-28 06:04:55,043 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.752e+01 8.844e+01 9.452e+01 9.978e+01 1.410e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 06:04:57,249 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2900, loss[loss=0.04979, simple_loss=0.06812, pruned_loss=0.006321, audio_tagging_loss=0.009411, over 13887.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08985, pruned_loss=0.01244, audio_tagging_loss=0.008529, over 3039820.44 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:05:03,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3386006.6666666665, ans=0.0 2023-11-28 06:05:03,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3386006.6666666665, ans=0.0 2023-11-28 06:05:07,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3386073.3333333335, ans=0.09899494936611666 2023-11-28 06:05:20,871 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2023-11-28 06:05:24,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3386140.0, ans=0.125 2023-11-28 06:05:25,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3386140.0, ans=0.05 2023-11-28 06:05:32,054 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.37 vs. limit=12.0 2023-11-28 06:05:33,063 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.21 vs. limit=15.0 2023-11-28 06:05:38,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3386206.6666666665, ans=0.125 2023-11-28 06:05:51,679 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507950 2023-11-28 06:05:52,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.52 vs. limit=22.5 2023-11-28 06:05:54,928 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2950, loss[loss=0.05886, simple_loss=0.07427, pruned_loss=0.01455, audio_tagging_loss=0.007178, over 14594.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09036, pruned_loss=0.01256, audio_tagging_loss=0.008585, over 3033810.14 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:05:58,518 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.39 vs. limit=22.5 2023-11-28 06:06:03,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3386340.0, ans=0.2 2023-11-28 06:06:07,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3386406.6666666665, ans=0.0 2023-11-28 06:06:15,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3386406.6666666665, ans=0.2 2023-11-28 06:06:18,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3386473.3333333335, ans=0.125 2023-11-28 06:06:23,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3386473.3333333335, ans=0.125 2023-11-28 06:06:49,068 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508000 2023-11-28 06:06:50,045 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.945e+01 9.706e+01 1.033e+02 1.300e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-28 06:06:50,422 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-508000.pt 2023-11-28 06:06:53,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3386606.6666666665, ans=0.125 2023-11-28 06:06:55,277 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3000, loss[loss=0.06459, simple_loss=0.09385, pruned_loss=0.01069, audio_tagging_loss=0.006967, over 15590.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08999, pruned_loss=0.01242, audio_tagging_loss=0.00862, over 3037648.20 frames. ], batch size: 60, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:06:55,280 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 06:07:17,724 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.9781, 2.9720, 2.8112, 2.6943, 3.4065, 3.2995, 3.1234, 3.6026], device='cuda:0') 2023-11-28 06:07:29,852 INFO [train_asr.py:1267] (0/4) Epoch 43, validation: loss=0.0576, simple_loss=0.05056, pruned_loss=0.005189, audio_tagging_loss=0.02713, over 4681554.00 frames. 2023-11-28 06:07:29,853 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 06:07:36,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3386673.3333333335, ans=0.125 2023-11-28 06:07:52,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3386806.6666666665, ans=0.015 2023-11-28 06:07:54,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3386806.6666666665, ans=0.125 2023-11-28 06:07:58,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3386806.6666666665, ans=0.125 2023-11-28 06:08:05,484 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2023-11-28 06:08:20,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3386940.0, ans=0.0 2023-11-28 06:08:21,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3386940.0, ans=0.1 2023-11-28 06:08:24,955 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508050 2023-11-28 06:08:28,281 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3050, loss[loss=0.06381, simple_loss=0.09137, pruned_loss=0.01125, audio_tagging_loss=0.006874, over 15097.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09056, pruned_loss=0.01258, audio_tagging_loss=0.00872, over 3036761.94 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:08:37,641 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.41 vs. limit=12.0 2023-11-28 06:08:48,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3387073.3333333335, ans=0.125 2023-11-28 06:09:05,151 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:09:22,242 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508100 2023-11-28 06:09:23,662 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.522e+01 8.950e+01 9.666e+01 1.022e+02 1.393e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-28 06:09:26,341 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3100, loss[loss=0.05678, simple_loss=0.07456, pruned_loss=0.01031, audio_tagging_loss=0.009187, over 15167.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09107, pruned_loss=0.01267, audio_tagging_loss=0.008635, over 3031779.88 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:10:15,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3387606.6666666665, ans=0.2 2023-11-28 06:10:19,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3387606.6666666665, ans=0.0 2023-11-28 06:10:20,485 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508150 2023-11-28 06:10:23,659 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3150, loss[loss=0.07313, simple_loss=0.1008, pruned_loss=0.01358, audio_tagging_loss=0.009169, over 15307.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09099, pruned_loss=0.0126, audio_tagging_loss=0.008733, over 3028334.89 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:10:28,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3387673.3333333335, ans=0.1 2023-11-28 06:10:49,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3387806.6666666665, ans=0.0 2023-11-28 06:10:57,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3387873.3333333335, ans=0.125 2023-11-28 06:10:57,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3387873.3333333335, ans=0.125 2023-11-28 06:11:07,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3387873.3333333335, ans=0.125 2023-11-28 06:11:08,001 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.21 vs. limit=15.0 2023-11-28 06:11:08,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3387940.0, ans=0.1 2023-11-28 06:11:13,746 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=22.5 2023-11-28 06:11:17,507 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508200 2023-11-28 06:11:17,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3387940.0, ans=0.2 2023-11-28 06:11:18,544 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.421e+01 8.906e+01 9.448e+01 1.016e+02 1.228e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 06:11:22,235 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3200, loss[loss=0.06922, simple_loss=0.09948, pruned_loss=0.01164, audio_tagging_loss=0.007837, over 16253.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09133, pruned_loss=0.01271, audio_tagging_loss=0.00882, over 3027473.53 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:12:15,242 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508250 2023-11-28 06:12:18,444 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3250, loss[loss=0.06309, simple_loss=0.08513, pruned_loss=0.01087, audio_tagging_loss=0.009649, over 15049.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09106, pruned_loss=0.01264, audio_tagging_loss=0.008889, over 3032263.34 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:12:33,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2023-11-28 06:12:40,710 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.85 vs. limit=15.0 2023-11-28 06:12:46,097 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-28 06:12:51,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3388473.3333333335, ans=0.2 2023-11-28 06:12:52,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3388540.0, ans=0.0 2023-11-28 06:13:00,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3388540.0, ans=0.5 2023-11-28 06:13:13,083 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508300 2023-11-28 06:13:15,105 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.764e+01 8.835e+01 9.450e+01 1.028e+02 1.248e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 06:13:16,201 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3300, loss[loss=0.06212, simple_loss=0.09062, pruned_loss=0.008373, audio_tagging_loss=0.008438, over 15202.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09072, pruned_loss=0.01273, audio_tagging_loss=0.008931, over 3036427.22 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 06:13:19,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3388673.3333333335, ans=0.0 2023-11-28 06:13:27,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3388740.0, ans=0.0 2023-11-28 06:13:29,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3388740.0, ans=0.0 2023-11-28 06:14:08,310 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.37 vs. limit=15.0 2023-11-28 06:14:08,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3388940.0, ans=0.2 2023-11-28 06:14:09,949 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508350 2023-11-28 06:14:13,207 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3350, loss[loss=0.06602, simple_loss=0.09729, pruned_loss=0.01117, audio_tagging_loss=0.006203, over 14898.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.08977, pruned_loss=0.01249, audio_tagging_loss=0.008901, over 3040405.77 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 06:14:49,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3389206.6666666665, ans=0.0 2023-11-28 06:15:00,644 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-28 06:15:08,396 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508400 2023-11-28 06:15:10,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3389273.3333333335, ans=0.1 2023-11-28 06:15:10,832 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.815e+01 9.381e+01 1.020e+02 1.211e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 06:15:11,910 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3400, loss[loss=0.05462, simple_loss=0.08477, pruned_loss=0.006418, audio_tagging_loss=0.005814, over 15776.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09063, pruned_loss=0.01254, audio_tagging_loss=0.008734, over 3052188.41 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:15:15,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3389340.0, ans=0.0 2023-11-28 06:15:17,775 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=12.0 2023-11-28 06:15:26,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3389406.6666666665, ans=0.125 2023-11-28 06:15:39,732 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.86 vs. limit=15.0 2023-11-28 06:15:55,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3389540.0, ans=0.125 2023-11-28 06:15:56,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3389540.0, ans=0.125 2023-11-28 06:15:57,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.55 vs. limit=5.0 2023-11-28 06:16:06,780 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508450 2023-11-28 06:16:09,961 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3450, loss[loss=0.05665, simple_loss=0.07959, pruned_loss=0.00923, audio_tagging_loss=0.007628, over 15853.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09007, pruned_loss=0.0124, audio_tagging_loss=0.008737, over 3046847.68 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:16:35,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3389806.6666666665, ans=0.125 2023-11-28 06:17:03,755 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508500 2023-11-28 06:17:04,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3389940.0, ans=0.125 2023-11-28 06:17:06,975 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 8.735e+01 9.510e+01 1.030e+02 1.229e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 06:17:07,005 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3500, loss[loss=0.07089, simple_loss=0.09806, pruned_loss=0.01446, audio_tagging_loss=0.007399, over 15361.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08976, pruned_loss=0.01237, audio_tagging_loss=0.008744, over 3044550.05 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:17:08,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3390006.6666666665, ans=0.0 2023-11-28 06:17:13,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3390006.6666666665, ans=0.1 2023-11-28 06:17:34,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3390140.0, ans=0.0 2023-11-28 06:17:38,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=15.0 2023-11-28 06:17:41,212 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:17:52,804 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2023-11-28 06:18:01,681 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508550 2023-11-28 06:18:04,915 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3550, loss[loss=0.06913, simple_loss=0.08818, pruned_loss=0.01429, audio_tagging_loss=0.01075, over 14031.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08904, pruned_loss=0.01233, audio_tagging_loss=0.008686, over 3045186.42 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:18:27,964 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.58 vs. limit=15.0 2023-11-28 06:19:00,592 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508600 2023-11-28 06:19:04,050 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 8.807e+01 9.210e+01 1.006e+02 1.301e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-28 06:19:04,076 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3600, loss[loss=0.0683, simple_loss=0.09237, pruned_loss=0.01351, audio_tagging_loss=0.008608, over 13959.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08864, pruned_loss=0.01229, audio_tagging_loss=0.008698, over 3047083.29 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:19:06,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3390673.3333333335, ans=0.125 2023-11-28 06:19:14,457 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.20 vs. limit=22.5 2023-11-28 06:19:25,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3390806.6666666665, ans=0.5 2023-11-28 06:19:37,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3390873.3333333335, ans=0.0 2023-11-28 06:19:51,610 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.35 vs. limit=22.5 2023-11-28 06:19:57,681 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508650 2023-11-28 06:20:00,995 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3650, loss[loss=0.05992, simple_loss=0.08528, pruned_loss=0.008154, audio_tagging_loss=0.009129, over 16245.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08945, pruned_loss=0.01227, audio_tagging_loss=0.008586, over 3048992.87 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:20:06,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3391006.6666666665, ans=0.125 2023-11-28 06:20:14,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3391073.3333333335, ans=0.125 2023-11-28 06:20:23,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3391140.0, ans=0.05 2023-11-28 06:20:24,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3391140.0, ans=0.125 2023-11-28 06:20:26,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3391140.0, ans=0.0 2023-11-28 06:20:42,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3391206.6666666665, ans=0.125 2023-11-28 06:20:48,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3391273.3333333335, ans=0.1 2023-11-28 06:20:54,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3391273.3333333335, ans=0.5 2023-11-28 06:20:55,005 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508700 2023-11-28 06:20:57,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3391340.0, ans=0.125 2023-11-28 06:20:58,169 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.782e+01 9.556e+01 1.009e+02 1.270e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 06:20:58,195 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3700, loss[loss=0.07153, simple_loss=0.09706, pruned_loss=0.01311, audio_tagging_loss=0.009888, over 14718.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08925, pruned_loss=0.01223, audio_tagging_loss=0.008632, over 3052138.53 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:21:11,091 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:21:11,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3391406.6666666665, ans=0.125 2023-11-28 06:21:16,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3391406.6666666665, ans=0.125 2023-11-28 06:21:21,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3391473.3333333335, ans=0.07 2023-11-28 06:21:44,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3391606.6666666665, ans=0.0 2023-11-28 06:21:53,602 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508750 2023-11-28 06:21:56,810 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3750, loss[loss=0.05748, simple_loss=0.07575, pruned_loss=0.009637, audio_tagging_loss=0.00997, over 15852.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09018, pruned_loss=0.0124, audio_tagging_loss=0.008652, over 3061897.47 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:22:07,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3391740.0, ans=0.1 2023-11-28 06:22:13,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3391740.0, ans=0.0 2023-11-28 06:22:20,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3391806.6666666665, ans=0.0 2023-11-28 06:22:27,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3391806.6666666665, ans=0.125 2023-11-28 06:22:30,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3391873.3333333335, ans=0.125 2023-11-28 06:22:40,355 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:22:42,109 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2023-11-28 06:22:47,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3391940.0, ans=0.125 2023-11-28 06:22:48,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3391940.0, ans=0.125 2023-11-28 06:22:48,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3391940.0, ans=0.125 2023-11-28 06:22:50,287 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508800 2023-11-28 06:22:53,889 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3800, loss[loss=0.06262, simple_loss=0.08555, pruned_loss=0.01272, audio_tagging_loss=0.007124, over 15562.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09177, pruned_loss=0.01264, audio_tagging_loss=0.008673, over 3058582.21 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:22:54,976 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.959e+01 9.739e+01 1.027e+02 1.673e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-28 06:23:48,353 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508850 2023-11-28 06:23:51,670 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3850, loss[loss=0.05155, simple_loss=0.07335, pruned_loss=0.006393, audio_tagging_loss=0.008485, over 14395.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09215, pruned_loss=0.01266, audio_tagging_loss=0.00869, over 3054274.43 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:23:51,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3392340.0, ans=0.125 2023-11-28 06:24:01,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=3392340.0, ans=12.0 2023-11-28 06:24:12,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3392406.6666666665, ans=0.1 2023-11-28 06:24:46,347 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508900 2023-11-28 06:24:50,032 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3900, loss[loss=0.06889, simple_loss=0.08708, pruned_loss=0.01596, audio_tagging_loss=0.009392, over 15296.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09191, pruned_loss=0.01277, audio_tagging_loss=0.008686, over 3052030.35 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:24:50,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3392673.3333333335, ans=0.2 2023-11-28 06:24:51,108 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.618e+01 9.454e+01 1.005e+02 2.661e+02, threshold=1.891e+02, percent-clipped=1.0 2023-11-28 06:25:11,724 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:25:35,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3392940.0, ans=10.0 2023-11-28 06:25:41,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3392940.0, ans=0.2 2023-11-28 06:25:43,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3392940.0, ans=0.07 2023-11-28 06:25:44,682 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508950 2023-11-28 06:25:48,022 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3950, loss[loss=0.05681, simple_loss=0.07826, pruned_loss=0.0077, audio_tagging_loss=0.009979, over 15874.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09126, pruned_loss=0.01275, audio_tagging_loss=0.008836, over 3051676.88 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:26:14,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3393140.0, ans=0.125 2023-11-28 06:26:17,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3393140.0, ans=0.125 2023-11-28 06:26:19,916 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=15.0 2023-11-28 06:26:36,454 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.78 vs. limit=15.0 2023-11-28 06:26:41,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3393273.3333333335, ans=0.125 2023-11-28 06:26:42,381 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509000 2023-11-28 06:26:43,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3393273.3333333335, ans=0.125 2023-11-28 06:26:46,291 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4000, loss[loss=0.08279, simple_loss=0.1064, pruned_loss=0.02122, audio_tagging_loss=0.008388, over 15514.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09094, pruned_loss=0.01273, audio_tagging_loss=0.008882, over 3046545.65 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:26:47,326 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.879e+01 9.661e+01 1.044e+02 1.748e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 06:26:50,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3393340.0, ans=0.0 2023-11-28 06:26:59,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3393406.6666666665, ans=0.0 2023-11-28 06:27:07,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3393406.6666666665, ans=0.015 2023-11-28 06:27:07,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3393406.6666666665, ans=0.125 2023-11-28 06:27:07,635 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.71 vs. limit=15.0 2023-11-28 06:27:15,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3393473.3333333335, ans=0.125 2023-11-28 06:27:19,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3393540.0, ans=0.0 2023-11-28 06:27:27,004 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=22.5 2023-11-28 06:27:30,769 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.90 vs. limit=15.0 2023-11-28 06:27:40,006 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509050 2023-11-28 06:27:44,189 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4050, loss[loss=0.06935, simple_loss=0.09114, pruned_loss=0.01384, audio_tagging_loss=0.009938, over 14496.00 frames. ], tot_loss[loss=0.06767, simple_loss=0.09186, pruned_loss=0.01289, audio_tagging_loss=0.008845, over 3044231.98 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:27:50,680 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:27:58,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3393740.0, ans=0.125 2023-11-28 06:28:20,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3393873.3333333335, ans=0.04949747468305833 2023-11-28 06:28:37,611 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509100 2023-11-28 06:28:40,843 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4100, loss[loss=0.09, simple_loss=0.1195, pruned_loss=0.02258, audio_tagging_loss=0.00766, over 15482.00 frames. ], tot_loss[loss=0.06802, simple_loss=0.0922, pruned_loss=0.01302, audio_tagging_loss=0.008907, over 3040752.44 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:28:43,584 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.882e+01 9.020e+01 9.493e+01 1.014e+02 1.731e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 06:28:49,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3394006.6666666665, ans=0.125 2023-11-28 06:28:58,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3394073.3333333335, ans=0.0 2023-11-28 06:29:17,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3394206.6666666665, ans=0.0 2023-11-28 06:29:35,297 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509150 2023-11-28 06:29:38,594 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4150, loss[loss=0.06678, simple_loss=0.09959, pruned_loss=0.009538, audio_tagging_loss=0.007443, over 16161.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09125, pruned_loss=0.01273, audio_tagging_loss=0.008832, over 3041019.84 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:29:40,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3394340.0, ans=0.1 2023-11-28 06:29:44,206 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=15.0 2023-11-28 06:29:48,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3394340.0, ans=0.125 2023-11-28 06:29:49,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3394406.6666666665, ans=0.1 2023-11-28 06:29:50,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3394406.6666666665, ans=0.125 2023-11-28 06:30:25,695 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:30:31,887 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.92 vs. limit=22.5 2023-11-28 06:30:33,295 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509200 2023-11-28 06:30:36,807 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4200, loss[loss=0.05555, simple_loss=0.06929, pruned_loss=0.00977, audio_tagging_loss=0.01113, over 15506.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09111, pruned_loss=0.01262, audio_tagging_loss=0.008719, over 3048944.40 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:30:39,980 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.720e+01 9.332e+01 1.017e+02 1.296e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-28 06:30:42,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2023-11-28 06:30:44,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3394673.3333333335, ans=0.95 2023-11-28 06:30:55,895 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:30:59,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3394806.6666666665, ans=0.125 2023-11-28 06:31:00,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3394806.6666666665, ans=0.125 2023-11-28 06:31:07,538 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:31:32,136 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509250 2023-11-28 06:31:35,363 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4250, loss[loss=0.07251, simple_loss=0.1069, pruned_loss=0.01304, audio_tagging_loss=0.006009, over 16352.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09108, pruned_loss=0.01266, audio_tagging_loss=0.008623, over 3055648.37 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:31:37,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3395006.6666666665, ans=0.0 2023-11-28 06:31:38,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3395006.6666666665, ans=0.0 2023-11-28 06:31:39,331 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=12.0 2023-11-28 06:31:45,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3395006.6666666665, ans=0.0 2023-11-28 06:31:45,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3395006.6666666665, ans=0.1 2023-11-28 06:31:49,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3395073.3333333335, ans=0.0 2023-11-28 06:32:01,511 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2023-11-28 06:32:12,150 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.50 vs. limit=15.0 2023-11-28 06:32:21,922 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2023-11-28 06:32:22,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3395273.3333333335, ans=0.125 2023-11-28 06:32:28,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3395273.3333333335, ans=0.125 2023-11-28 06:32:29,186 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509300 2023-11-28 06:32:33,203 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4300, loss[loss=0.06789, simple_loss=0.08623, pruned_loss=0.01238, audio_tagging_loss=0.01241, over 15621.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09067, pruned_loss=0.01242, audio_tagging_loss=0.008608, over 3050984.20 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:32:35,398 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.011e+01 8.843e+01 9.507e+01 1.023e+02 2.128e+02, threshold=1.901e+02, percent-clipped=1.0 2023-11-28 06:32:42,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3395340.0, ans=0.1 2023-11-28 06:32:48,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3395406.6666666665, ans=0.0 2023-11-28 06:33:06,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3395540.0, ans=0.07 2023-11-28 06:33:11,468 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=22.5 2023-11-28 06:33:13,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3395540.0, ans=0.125 2023-11-28 06:33:27,968 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509350 2023-11-28 06:33:29,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.59 vs. limit=15.0 2023-11-28 06:33:31,220 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4350, loss[loss=0.08653, simple_loss=0.1167, pruned_loss=0.01867, audio_tagging_loss=0.009482, over 15285.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09097, pruned_loss=0.01238, audio_tagging_loss=0.008612, over 3042141.18 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:33:33,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3395673.3333333335, ans=0.125 2023-11-28 06:34:02,085 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=15.0 2023-11-28 06:34:03,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3395806.6666666665, ans=10.0 2023-11-28 06:34:14,638 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.75 vs. limit=15.0 2023-11-28 06:34:15,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3395873.3333333335, ans=0.0 2023-11-28 06:34:26,171 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509400 2023-11-28 06:34:29,646 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4400, loss[loss=0.06, simple_loss=0.08168, pruned_loss=0.01304, audio_tagging_loss=0.006121, over 15714.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09126, pruned_loss=0.01238, audio_tagging_loss=0.008569, over 3047718.81 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:34:31,828 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.603e+01 8.958e+01 9.354e+01 1.037e+02 1.325e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-28 06:34:35,656 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=22.5 2023-11-28 06:34:56,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3396140.0, ans=0.125 2023-11-28 06:35:06,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3396206.6666666665, ans=0.125 2023-11-28 06:35:23,652 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509450 2023-11-28 06:35:26,825 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4450, loss[loss=0.06182, simple_loss=0.08082, pruned_loss=0.0124, audio_tagging_loss=0.009015, over 14843.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09087, pruned_loss=0.01231, audio_tagging_loss=0.008558, over 3047733.75 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:35:36,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3396340.0, ans=0.1 2023-11-28 06:35:40,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3396406.6666666665, ans=0.125 2023-11-28 06:35:44,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3396406.6666666665, ans=0.2 2023-11-28 06:35:50,714 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:35:52,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3396473.3333333335, ans=0.125 2023-11-28 06:36:21,658 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509500 2023-11-28 06:36:22,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3396606.6666666665, ans=0.125 2023-11-28 06:36:24,888 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4500, loss[loss=0.05374, simple_loss=0.07753, pruned_loss=0.008442, audio_tagging_loss=0.006535, over 14726.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09187, pruned_loss=0.01245, audio_tagging_loss=0.008459, over 3048431.75 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:36:27,125 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 8.759e+01 9.220e+01 9.806e+01 1.292e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-28 06:36:31,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3396673.3333333335, ans=0.1 2023-11-28 06:36:43,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3396740.0, ans=0.1 2023-11-28 06:36:56,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3396806.6666666665, ans=0.0 2023-11-28 06:37:16,996 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.54 vs. limit=22.5 2023-11-28 06:37:18,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3396940.0, ans=0.0 2023-11-28 06:37:19,909 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509550 2023-11-28 06:37:23,171 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4550, loss[loss=0.07898, simple_loss=0.1074, pruned_loss=0.01714, audio_tagging_loss=0.008118, over 15619.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09135, pruned_loss=0.01236, audio_tagging_loss=0.008493, over 3047216.16 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:37:31,472 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.06 vs. limit=15.0 2023-11-28 06:37:51,412 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.47 vs. limit=10.0 2023-11-28 06:37:56,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3397206.6666666665, ans=0.04949747468305833 2023-11-28 06:38:10,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3397273.3333333335, ans=0.035 2023-11-28 06:38:11,280 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:38:12,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3397273.3333333335, ans=0.125 2023-11-28 06:38:12,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3397273.3333333335, ans=0.1 2023-11-28 06:38:16,802 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509600 2023-11-28 06:38:20,304 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4600, loss[loss=0.05905, simple_loss=0.075, pruned_loss=0.01203, audio_tagging_loss=0.009519, over 14533.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09082, pruned_loss=0.01225, audio_tagging_loss=0.00859, over 3051950.65 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:38:22,438 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.061e+01 8.724e+01 9.423e+01 1.019e+02 1.398e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-28 06:38:40,770 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2023-11-28 06:38:54,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3397540.0, ans=0.2 2023-11-28 06:38:55,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3397540.0, ans=0.125 2023-11-28 06:38:56,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3397540.0, ans=0.07 2023-11-28 06:39:14,825 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509650 2023-11-28 06:39:15,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=3397606.6666666665, ans=0.2 2023-11-28 06:39:17,580 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=15.0 2023-11-28 06:39:18,108 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4650, loss[loss=0.08403, simple_loss=0.1159, pruned_loss=0.01595, audio_tagging_loss=0.0101, over 15408.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09041, pruned_loss=0.01225, audio_tagging_loss=0.008662, over 3049886.38 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:39:29,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3397740.0, ans=0.1 2023-11-28 06:39:34,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3397740.0, ans=0.125 2023-11-28 06:39:45,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3397806.6666666665, ans=0.125 2023-11-28 06:39:59,402 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2023-11-28 06:40:13,675 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509700 2023-11-28 06:40:16,812 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4700, loss[loss=0.05382, simple_loss=0.07402, pruned_loss=0.008128, audio_tagging_loss=0.008678, over 14546.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08955, pruned_loss=0.0123, audio_tagging_loss=0.008796, over 3047544.74 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:40:18,961 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.949e+01 8.857e+01 9.480e+01 1.024e+02 1.425e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 06:41:10,398 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509750 2023-11-28 06:41:13,607 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4750, loss[loss=0.05157, simple_loss=0.07035, pruned_loss=0.007598, audio_tagging_loss=0.008801, over 14710.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08962, pruned_loss=0.01225, audio_tagging_loss=0.008823, over 3043778.20 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:41:28,025 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2023-11-28 06:41:42,705 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=22.5 2023-11-28 06:41:51,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=3398540.0, ans=12.0 2023-11-28 06:41:58,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3398606.6666666665, ans=0.125 2023-11-28 06:42:04,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3398606.6666666665, ans=0.0 2023-11-28 06:42:07,311 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509800 2023-11-28 06:42:11,433 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4800, loss[loss=0.06988, simple_loss=0.08992, pruned_loss=0.01473, audio_tagging_loss=0.01018, over 14885.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08968, pruned_loss=0.01228, audio_tagging_loss=0.008981, over 3043286.07 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:42:13,640 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.791e+01 9.387e+01 1.001e+02 1.346e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 06:42:50,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3398873.3333333335, ans=0.125 2023-11-28 06:42:50,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3398873.3333333335, ans=0.2 2023-11-28 06:43:05,734 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509850 2023-11-28 06:43:09,534 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4850, loss[loss=0.0563, simple_loss=0.06667, pruned_loss=0.01142, audio_tagging_loss=0.01154, over 15875.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08962, pruned_loss=0.01232, audio_tagging_loss=0.009114, over 3044370.84 frames. ], batch size: 64, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:43:15,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3399006.6666666665, ans=0.125 2023-11-28 06:43:36,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3399140.0, ans=0.125 2023-11-28 06:43:49,368 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=15.0 2023-11-28 06:44:02,649 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509900 2023-11-28 06:44:02,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3399273.3333333335, ans=0.125 2023-11-28 06:44:02,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3399273.3333333335, ans=0.1 2023-11-28 06:44:05,795 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4900, loss[loss=0.04926, simple_loss=0.06065, pruned_loss=0.01201, audio_tagging_loss=0.006921, over 13959.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08946, pruned_loss=0.0123, audio_tagging_loss=0.009039, over 3036728.50 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:44:07,982 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.789e+01 9.268e+01 1.027e+02 1.406e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-28 06:44:16,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3399406.6666666665, ans=0.125 2023-11-28 06:44:41,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3399540.0, ans=0.04949747468305833 2023-11-28 06:44:46,923 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.73 vs. limit=10.0 2023-11-28 06:44:59,553 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509950 2023-11-28 06:45:03,472 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4950, loss[loss=0.05245, simple_loss=0.06899, pruned_loss=0.008454, audio_tagging_loss=0.0095, over 15489.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08916, pruned_loss=0.01213, audio_tagging_loss=0.008897, over 3044230.86 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:45:03,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3399673.3333333335, ans=0.0 2023-11-28 06:45:14,660 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:45:26,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3399806.6666666665, ans=0.125 2023-11-28 06:45:27,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3399806.6666666665, ans=0.1 2023-11-28 06:45:28,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3399806.6666666665, ans=0.125 2023-11-28 06:45:29,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3399806.6666666665, ans=0.2 2023-11-28 06:45:37,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3399873.3333333335, ans=0.015 2023-11-28 06:45:53,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3399940.0, ans=0.125 2023-11-28 06:45:57,788 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510000 2023-11-28 06:45:59,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3399940.0, ans=0.1 2023-11-28 06:46:01,570 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5000, loss[loss=0.07145, simple_loss=0.09715, pruned_loss=0.0137, audio_tagging_loss=0.009177, over 14183.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08939, pruned_loss=0.01212, audio_tagging_loss=0.008686, over 3038922.26 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:46:05,258 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 8.682e+01 9.362e+01 1.003e+02 1.327e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 06:46:10,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3400006.6666666665, ans=0.1 2023-11-28 06:46:14,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3400073.3333333335, ans=0.0 2023-11-28 06:46:26,230 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=22.5 2023-11-28 06:46:37,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3400206.6666666665, ans=0.1 2023-11-28 06:46:40,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3400206.6666666665, ans=0.1 2023-11-28 06:46:45,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3400206.6666666665, ans=0.125 2023-11-28 06:46:46,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2023-11-28 06:46:56,489 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510050 2023-11-28 06:46:59,736 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5050, loss[loss=0.06192, simple_loss=0.08289, pruned_loss=0.01279, audio_tagging_loss=0.007688, over 15384.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08938, pruned_loss=0.01218, audio_tagging_loss=0.008551, over 3041458.45 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:47:14,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3400406.6666666665, ans=0.2 2023-11-28 06:47:18,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3400406.6666666665, ans=0.125 2023-11-28 06:47:40,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3400540.0, ans=0.09899494936611666 2023-11-28 06:47:53,301 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510100 2023-11-28 06:47:56,495 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5100, loss[loss=0.06542, simple_loss=0.08633, pruned_loss=0.0132, audio_tagging_loss=0.009056, over 15309.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08939, pruned_loss=0.01224, audio_tagging_loss=0.008602, over 3043996.81 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:48:00,394 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 8.708e+01 9.390e+01 1.019e+02 1.146e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 06:48:26,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3400806.6666666665, ans=0.0 2023-11-28 06:48:30,004 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.62 vs. limit=12.0 2023-11-28 06:48:40,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3400873.3333333335, ans=0.125 2023-11-28 06:48:46,164 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.41 vs. limit=22.5 2023-11-28 06:48:50,997 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510150 2023-11-28 06:48:54,194 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5150, loss[loss=0.06426, simple_loss=0.08719, pruned_loss=0.01208, audio_tagging_loss=0.008587, over 15355.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.0898, pruned_loss=0.01223, audio_tagging_loss=0.008597, over 3046246.74 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:49:01,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3401006.6666666665, ans=0.0 2023-11-28 06:49:09,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3401073.3333333335, ans=0.125 2023-11-28 06:49:12,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3401073.3333333335, ans=0.125 2023-11-28 06:49:20,456 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=15.0 2023-11-28 06:49:23,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3401140.0, ans=0.125 2023-11-28 06:49:24,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.98 vs. limit=22.5 2023-11-28 06:49:43,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3401273.3333333335, ans=0.0 2023-11-28 06:49:48,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3401273.3333333335, ans=0.125 2023-11-28 06:49:49,087 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510200 2023-11-28 06:49:52,689 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5200, loss[loss=0.08017, simple_loss=0.111, pruned_loss=0.01723, audio_tagging_loss=0.007451, over 15354.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09004, pruned_loss=0.01234, audio_tagging_loss=0.008665, over 3049908.69 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:49:56,620 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.435e+01 8.684e+01 9.283e+01 1.002e+02 1.274e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 06:50:02,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3401340.0, ans=0.2 2023-11-28 06:50:15,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3401473.3333333335, ans=0.0 2023-11-28 06:50:15,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3401473.3333333335, ans=0.125 2023-11-28 06:50:41,172 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.78 vs. limit=15.0 2023-11-28 06:50:47,142 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510250 2023-11-28 06:50:50,399 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5250, loss[loss=0.06422, simple_loss=0.08645, pruned_loss=0.01035, audio_tagging_loss=0.01065, over 15665.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09019, pruned_loss=0.0125, audio_tagging_loss=0.008616, over 3055745.50 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:51:11,108 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.87 vs. limit=15.0 2023-11-28 06:51:32,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3401873.3333333335, ans=0.0 2023-11-28 06:51:42,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3401940.0, ans=0.1 2023-11-28 06:51:44,526 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510300 2023-11-28 06:51:47,664 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5300, loss[loss=0.05472, simple_loss=0.06807, pruned_loss=0.007115, audio_tagging_loss=0.01356, over 16542.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09065, pruned_loss=0.01252, audio_tagging_loss=0.008596, over 3060789.93 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:51:50,949 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.521e+01 8.883e+01 9.472e+01 1.016e+02 1.198e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 06:51:51,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3402006.6666666665, ans=0.1 2023-11-28 06:52:06,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3402073.3333333335, ans=0.125 2023-11-28 06:52:42,467 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510350 2023-11-28 06:52:45,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.17 vs. limit=15.0 2023-11-28 06:52:45,619 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5350, loss[loss=0.04853, simple_loss=0.05995, pruned_loss=0.008274, audio_tagging_loss=0.01028, over 15372.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08995, pruned_loss=0.01235, audio_tagging_loss=0.00867, over 3056551.22 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:53:32,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3402606.6666666665, ans=0.125 2023-11-28 06:53:39,722 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510400 2023-11-28 06:53:43,133 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5400, loss[loss=0.05923, simple_loss=0.08291, pruned_loss=0.01077, audio_tagging_loss=0.007004, over 15505.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08951, pruned_loss=0.01233, audio_tagging_loss=0.008644, over 3051976.24 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:53:47,400 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.415e+01 8.616e+01 9.187e+01 1.017e+02 1.243e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-28 06:53:49,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3402673.3333333335, ans=0.125 2023-11-28 06:53:51,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3402673.3333333335, ans=0.1 2023-11-28 06:53:58,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3402740.0, ans=0.1 2023-11-28 06:54:08,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3402806.6666666665, ans=0.0 2023-11-28 06:54:15,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3402806.6666666665, ans=0.125 2023-11-28 06:54:16,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3402873.3333333335, ans=0.0 2023-11-28 06:54:37,165 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510450 2023-11-28 06:54:39,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3403006.6666666665, ans=0.1 2023-11-28 06:54:40,382 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5450, loss[loss=0.06613, simple_loss=0.0902, pruned_loss=0.01345, audio_tagging_loss=0.007579, over 14762.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09041, pruned_loss=0.01256, audio_tagging_loss=0.008646, over 3051258.53 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:54:49,880 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.11 vs. limit=10.0 2023-11-28 06:54:58,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3403073.3333333335, ans=0.125 2023-11-28 06:55:16,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=3403206.6666666665, ans=0.1 2023-11-28 06:55:20,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3403206.6666666665, ans=0.1 2023-11-28 06:55:34,804 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510500 2023-11-28 06:55:38,012 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5500, loss[loss=0.06318, simple_loss=0.08768, pruned_loss=0.01242, audio_tagging_loss=0.00692, over 15641.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09152, pruned_loss=0.0129, audio_tagging_loss=0.008656, over 3052058.06 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:55:41,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3403340.0, ans=0.0 2023-11-28 06:55:42,354 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.114e+01 8.928e+01 9.472e+01 1.024e+02 1.464e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 06:55:48,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3403406.6666666665, ans=0.1 2023-11-28 06:55:59,182 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.95 vs. limit=22.5 2023-11-28 06:56:11,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3403540.0, ans=0.2 2023-11-28 06:56:31,804 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510550 2023-11-28 06:56:32,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3403606.6666666665, ans=0.125 2023-11-28 06:56:33,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3403606.6666666665, ans=0.1 2023-11-28 06:56:34,969 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5550, loss[loss=0.06213, simple_loss=0.08396, pruned_loss=0.01061, audio_tagging_loss=0.009538, over 14730.00 frames. ], tot_loss[loss=0.06751, simple_loss=0.09158, pruned_loss=0.01304, audio_tagging_loss=0.008683, over 3053343.90 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:56:39,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3403673.3333333335, ans=0.0 2023-11-28 06:56:39,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3403673.3333333335, ans=0.125 2023-11-28 06:56:42,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3403673.3333333335, ans=0.125 2023-11-28 06:57:03,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3403806.6666666665, ans=0.1 2023-11-28 06:57:03,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3403806.6666666665, ans=0.1 2023-11-28 06:57:16,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3403873.3333333335, ans=0.125 2023-11-28 06:57:16,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3403873.3333333335, ans=0.125 2023-11-28 06:57:19,216 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.21 vs. limit=15.0 2023-11-28 06:57:23,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3403940.0, ans=0.2 2023-11-28 06:57:27,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3403940.0, ans=0.125 2023-11-28 06:57:29,866 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510600 2023-11-28 06:57:30,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3403940.0, ans=0.125 2023-11-28 06:57:32,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3404006.6666666665, ans=0.0 2023-11-28 06:57:33,179 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.72 vs. limit=5.0 2023-11-28 06:57:33,363 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5600, loss[loss=0.05306, simple_loss=0.05947, pruned_loss=0.00885, audio_tagging_loss=0.01448, over 14876.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09089, pruned_loss=0.01287, audio_tagging_loss=0.008894, over 3050744.73 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:57:37,612 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 9.053e+01 9.702e+01 1.068e+02 1.336e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-28 06:57:41,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3404006.6666666665, ans=0.09899494936611666 2023-11-28 06:57:50,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3404073.3333333335, ans=0.125 2023-11-28 06:57:51,349 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.74 vs. limit=22.5 2023-11-28 06:58:03,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3404140.0, ans=0.125 2023-11-28 06:58:10,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3404206.6666666665, ans=0.0 2023-11-28 06:58:17,465 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:58:27,132 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510650 2023-11-28 06:58:28,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3404273.3333333335, ans=0.125 2023-11-28 06:58:30,288 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5650, loss[loss=0.05374, simple_loss=0.06669, pruned_loss=0.009511, audio_tagging_loss=0.01089, over 14920.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09054, pruned_loss=0.01268, audio_tagging_loss=0.008905, over 3055029.74 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:58:31,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3404340.0, ans=0.125 2023-11-28 06:58:31,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3404340.0, ans=0.0 2023-11-28 06:58:44,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3404406.6666666665, ans=0.125 2023-11-28 06:58:48,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3404406.6666666665, ans=0.2 2023-11-28 06:59:02,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3404473.3333333335, ans=0.09899494936611666 2023-11-28 06:59:11,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3404540.0, ans=0.0 2023-11-28 06:59:12,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3404540.0, ans=0.125 2023-11-28 06:59:24,119 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510700 2023-11-28 06:59:27,445 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5700, loss[loss=0.06457, simple_loss=0.08309, pruned_loss=0.01526, audio_tagging_loss=0.007771, over 13767.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08914, pruned_loss=0.01233, audio_tagging_loss=0.008849, over 3053466.70 frames. ], batch size: 52, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:59:32,862 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.542e+01 8.734e+01 9.325e+01 1.007e+02 1.153e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 06:59:33,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3404673.3333333335, ans=0.5 2023-11-28 06:59:33,593 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=12.0 2023-11-28 06:59:36,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.43 vs. limit=15.0 2023-11-28 06:59:51,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3404806.6666666665, ans=0.0 2023-11-28 07:00:05,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3404873.3333333335, ans=0.125 2023-11-28 07:00:20,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2023-11-28 07:00:21,536 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510750 2023-11-28 07:00:24,728 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5750, loss[loss=0.05387, simple_loss=0.06539, pruned_loss=0.01037, audio_tagging_loss=0.0108, over 15132.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08984, pruned_loss=0.01257, audio_tagging_loss=0.008845, over 3055094.13 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:00:49,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.62 vs. limit=15.0 2023-11-28 07:00:52,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3405140.0, ans=0.2 2023-11-28 07:00:54,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3405140.0, ans=0.0 2023-11-28 07:01:18,585 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510800 2023-11-28 07:01:22,747 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5800, loss[loss=0.06793, simple_loss=0.09576, pruned_loss=0.009816, audio_tagging_loss=0.01023, over 15491.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08957, pruned_loss=0.01243, audio_tagging_loss=0.00883, over 3056046.01 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:01:28,132 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.777e+01 9.348e+01 1.032e+02 1.624e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-28 07:01:36,475 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.39 vs. limit=15.0 2023-11-28 07:01:37,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3405406.6666666665, ans=0.09899494936611666 2023-11-28 07:01:37,729 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=22.5 2023-11-28 07:01:42,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3405406.6666666665, ans=0.1 2023-11-28 07:01:45,402 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.99 vs. limit=15.0 2023-11-28 07:01:45,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3405473.3333333335, ans=0.0 2023-11-28 07:01:56,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3405540.0, ans=0.1 2023-11-28 07:02:05,931 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.96 vs. limit=15.0 2023-11-28 07:02:08,349 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=22.5 2023-11-28 07:02:16,404 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510850 2023-11-28 07:02:19,677 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5850, loss[loss=0.09206, simple_loss=0.1287, pruned_loss=0.01996, audio_tagging_loss=0.007766, over 15700.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08897, pruned_loss=0.0123, audio_tagging_loss=0.008793, over 3058306.76 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:02:25,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3405673.3333333335, ans=0.0 2023-11-28 07:02:26,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3405673.3333333335, ans=0.0 2023-11-28 07:02:29,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3405740.0, ans=0.125 2023-11-28 07:02:30,189 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.97 vs. limit=22.5 2023-11-28 07:02:32,197 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2023-11-28 07:02:50,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3405806.6666666665, ans=0.125 2023-11-28 07:02:54,407 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.64 vs. limit=15.0 2023-11-28 07:02:55,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2023-11-28 07:03:13,188 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510900 2023-11-28 07:03:15,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3406006.6666666665, ans=0.0 2023-11-28 07:03:16,958 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5900, loss[loss=0.06302, simple_loss=0.08613, pruned_loss=0.009974, audio_tagging_loss=0.009981, over 14796.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08939, pruned_loss=0.01227, audio_tagging_loss=0.008695, over 3055660.09 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:03:22,376 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.089e+01 8.808e+01 9.419e+01 9.961e+01 1.259e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 07:03:29,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3406073.3333333335, ans=0.0 2023-11-28 07:03:34,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3406073.3333333335, ans=0.125 2023-11-28 07:03:37,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3406073.3333333335, ans=0.0 2023-11-28 07:03:41,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3406140.0, ans=0.0 2023-11-28 07:04:11,287 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510950 2023-11-28 07:04:14,899 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5950, loss[loss=0.06975, simple_loss=0.08855, pruned_loss=0.01596, audio_tagging_loss=0.009514, over 15107.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08872, pruned_loss=0.01216, audio_tagging_loss=0.008825, over 3058031.45 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:04:17,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3406340.0, ans=0.125 2023-11-28 07:04:24,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3406340.0, ans=0.1 2023-11-28 07:04:28,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3406406.6666666665, ans=0.0 2023-11-28 07:04:32,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3406406.6666666665, ans=0.0 2023-11-28 07:05:09,089 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511000 2023-11-28 07:05:12,622 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6000, loss[loss=0.04153, simple_loss=0.05005, pruned_loss=0.005165, audio_tagging_loss=0.01134, over 14352.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08951, pruned_loss=0.01227, audio_tagging_loss=0.008696, over 3056357.77 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:05:12,625 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 07:05:32,351 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0321, 5.8990, 5.6907, 5.6364], device='cuda:0') 2023-11-28 07:05:36,676 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.2081, 3.9718, 3.7288, 3.2146], device='cuda:0') 2023-11-28 07:05:47,617 INFO [train_asr.py:1267] (0/4) Epoch 43, validation: loss=0.0577, simple_loss=0.05058, pruned_loss=0.005244, audio_tagging_loss=0.02717, over 4681554.00 frames. 2023-11-28 07:05:47,618 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 07:05:53,015 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.750e+01 9.275e+01 1.001e+02 1.273e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-28 07:05:54,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3406673.3333333335, ans=0.125 2023-11-28 07:06:04,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3406740.0, ans=0.0 2023-11-28 07:06:05,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3406740.0, ans=0.125 2023-11-28 07:06:19,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.57 vs. limit=10.0 2023-11-28 07:06:21,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3406873.3333333335, ans=0.125 2023-11-28 07:06:28,775 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:06:32,435 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 07:06:33,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3406940.0, ans=0.125 2023-11-28 07:06:37,255 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.89 vs. limit=10.0 2023-11-28 07:06:40,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3406940.0, ans=0.125 2023-11-28 07:06:41,939 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511050 2023-11-28 07:06:45,676 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6050, loss[loss=0.05684, simple_loss=0.07868, pruned_loss=0.006645, audio_tagging_loss=0.01086, over 15370.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09, pruned_loss=0.01235, audio_tagging_loss=0.008709, over 3057095.46 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:06:51,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.21 vs. limit=10.0 2023-11-28 07:06:54,128 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=22.5 2023-11-28 07:07:00,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3407073.3333333335, ans=0.125 2023-11-28 07:07:14,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3407140.0, ans=0.0 2023-11-28 07:07:16,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3407140.0, ans=0.125 2023-11-28 07:07:39,111 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511100 2023-11-28 07:07:42,407 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6100, loss[loss=0.07268, simple_loss=0.1071, pruned_loss=0.01262, audio_tagging_loss=0.006488, over 15175.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09048, pruned_loss=0.01249, audio_tagging_loss=0.00873, over 3056310.33 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:07:47,823 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.837e+01 9.364e+01 1.005e+02 1.238e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 07:08:36,574 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511150 2023-11-28 07:08:39,906 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6150, loss[loss=0.06242, simple_loss=0.07879, pruned_loss=0.01473, audio_tagging_loss=0.00829, over 15292.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09046, pruned_loss=0.01262, audio_tagging_loss=0.008685, over 3051024.82 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:08:41,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3407673.3333333335, ans=0.125 2023-11-28 07:08:45,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3407673.3333333335, ans=0.125 2023-11-28 07:08:47,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3407673.3333333335, ans=0.125 2023-11-28 07:08:55,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2023-11-28 07:09:08,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3407806.6666666665, ans=0.1 2023-11-28 07:09:23,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3407873.3333333335, ans=0.0 2023-11-28 07:09:24,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3407940.0, ans=0.125 2023-11-28 07:09:33,558 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511200 2023-11-28 07:09:37,600 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6200, loss[loss=0.08369, simple_loss=0.116, pruned_loss=0.01894, audio_tagging_loss=0.006745, over 15019.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08996, pruned_loss=0.01259, audio_tagging_loss=0.008787, over 3050594.52 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:09:43,633 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.472e+01 8.632e+01 9.387e+01 1.018e+02 1.235e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 07:10:00,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3408140.0, ans=0.0 2023-11-28 07:10:15,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3408206.6666666665, ans=0.2 2023-11-28 07:10:26,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3408273.3333333335, ans=0.2 2023-11-28 07:10:31,638 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511250 2023-11-28 07:10:31,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3408273.3333333335, ans=0.125 2023-11-28 07:10:34,785 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6250, loss[loss=0.05138, simple_loss=0.06315, pruned_loss=0.008658, audio_tagging_loss=0.01114, over 15607.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09021, pruned_loss=0.01252, audio_tagging_loss=0.008823, over 3048777.47 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:10:38,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3408340.0, ans=0.0 2023-11-28 07:10:48,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3408406.6666666665, ans=0.0 2023-11-28 07:11:11,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3408540.0, ans=0.125 2023-11-28 07:11:28,907 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511300 2023-11-28 07:11:32,054 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6300, loss[loss=0.05543, simple_loss=0.06365, pruned_loss=0.01364, audio_tagging_loss=0.009962, over 15391.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09081, pruned_loss=0.01269, audio_tagging_loss=0.008841, over 3043820.43 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:11:38,159 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 8.880e+01 9.504e+01 1.024e+02 1.327e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 07:11:38,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3408673.3333333335, ans=0.125 2023-11-28 07:11:41,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3408673.3333333335, ans=0.0 2023-11-28 07:11:52,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3408740.0, ans=0.125 2023-11-28 07:11:57,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3408806.6666666665, ans=0.0 2023-11-28 07:12:19,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3408940.0, ans=0.035 2023-11-28 07:12:23,676 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.14 vs. limit=22.5 2023-11-28 07:12:26,527 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511350 2023-11-28 07:12:29,722 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6350, loss[loss=0.08017, simple_loss=0.1059, pruned_loss=0.0181, audio_tagging_loss=0.009103, over 15270.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09059, pruned_loss=0.01273, audio_tagging_loss=0.008934, over 3046114.20 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:12:35,593 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.62 vs. limit=22.5 2023-11-28 07:13:02,789 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.53 vs. limit=10.0 2023-11-28 07:13:05,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3409206.6666666665, ans=0.125 2023-11-28 07:13:13,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3409206.6666666665, ans=0.125 2023-11-28 07:13:24,557 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511400 2023-11-28 07:13:28,601 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6400, loss[loss=0.07143, simple_loss=0.09705, pruned_loss=0.01389, audio_tagging_loss=0.009014, over 15577.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09087, pruned_loss=0.01265, audio_tagging_loss=0.008906, over 3045296.50 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:13:31,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3409340.0, ans=0.125 2023-11-28 07:13:35,202 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.831e+01 9.327e+01 9.903e+01 1.480e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 07:13:40,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3409406.6666666665, ans=0.125 2023-11-28 07:13:44,519 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.85 vs. limit=22.5 2023-11-28 07:13:57,272 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.34 vs. limit=15.0 2023-11-28 07:14:05,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3409540.0, ans=0.0 2023-11-28 07:14:21,595 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511450 2023-11-28 07:14:24,498 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.56 vs. limit=12.0 2023-11-28 07:14:24,823 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6450, loss[loss=0.0599, simple_loss=0.07734, pruned_loss=0.008064, audio_tagging_loss=0.01317, over 15104.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09028, pruned_loss=0.01261, audio_tagging_loss=0.009011, over 3043601.27 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:15:18,595 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511500 2023-11-28 07:15:21,726 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6500, loss[loss=0.06836, simple_loss=0.08959, pruned_loss=0.01394, audio_tagging_loss=0.009626, over 14179.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.08989, pruned_loss=0.01255, audio_tagging_loss=0.008995, over 3042162.85 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:15:28,781 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.593e+01 8.791e+01 9.611e+01 1.014e+02 1.471e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 07:15:32,335 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:15:33,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3410073.3333333335, ans=0.0 2023-11-28 07:15:52,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3410140.0, ans=0.0 2023-11-28 07:15:53,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3410140.0, ans=0.125 2023-11-28 07:15:54,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.79 vs. limit=22.5 2023-11-28 07:16:06,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3410273.3333333335, ans=0.0 2023-11-28 07:16:08,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3410273.3333333335, ans=0.04949747468305833 2023-11-28 07:16:15,971 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511550 2023-11-28 07:16:16,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3410273.3333333335, ans=0.125 2023-11-28 07:16:16,638 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2023-11-28 07:16:17,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3410273.3333333335, ans=0.04949747468305833 2023-11-28 07:16:19,186 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6550, loss[loss=0.06214, simple_loss=0.08089, pruned_loss=0.0124, audio_tagging_loss=0.009293, over 14764.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08967, pruned_loss=0.0125, audio_tagging_loss=0.008815, over 3044824.31 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:16:26,907 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2023-11-28 07:16:29,976 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:16:43,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3410473.3333333335, ans=10.0 2023-11-28 07:16:55,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3410540.0, ans=0.0 2023-11-28 07:16:56,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3410540.0, ans=0.2 2023-11-28 07:16:56,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3410540.0, ans=0.2 2023-11-28 07:17:12,744 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511600 2023-11-28 07:17:16,242 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6600, loss[loss=0.04948, simple_loss=0.06547, pruned_loss=0.008894, audio_tagging_loss=0.007854, over 14803.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08908, pruned_loss=0.01222, audio_tagging_loss=0.008731, over 3044928.41 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:17:16,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3410673.3333333335, ans=0.125 2023-11-28 07:17:20,191 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.01 vs. limit=22.5 2023-11-28 07:17:24,291 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.209e+01 8.683e+01 9.479e+01 1.018e+02 1.462e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 07:17:26,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3410740.0, ans=0.04949747468305833 2023-11-28 07:17:27,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=15.0 2023-11-28 07:17:29,214 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.23 vs. limit=10.0 2023-11-28 07:17:33,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3410740.0, ans=0.2 2023-11-28 07:17:47,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3410806.6666666665, ans=0.125 2023-11-28 07:18:02,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3410940.0, ans=0.0 2023-11-28 07:18:02,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3410940.0, ans=0.125 2023-11-28 07:18:09,978 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511650 2023-11-28 07:18:12,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3411006.6666666665, ans=0.0 2023-11-28 07:18:13,105 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6650, loss[loss=0.09223, simple_loss=0.1352, pruned_loss=0.01944, audio_tagging_loss=0.005178, over 15163.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08865, pruned_loss=0.01223, audio_tagging_loss=0.00872, over 3044828.22 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:18:18,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3411006.6666666665, ans=0.1 2023-11-28 07:18:37,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3411140.0, ans=0.1 2023-11-28 07:18:49,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3411206.6666666665, ans=0.125 2023-11-28 07:19:00,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3411273.3333333335, ans=0.1 2023-11-28 07:19:00,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3411273.3333333335, ans=0.1 2023-11-28 07:19:03,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3411273.3333333335, ans=0.1 2023-11-28 07:19:07,131 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511700 2023-11-28 07:19:10,401 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6700, loss[loss=0.09023, simple_loss=0.1274, pruned_loss=0.01813, audio_tagging_loss=0.008407, over 14910.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08924, pruned_loss=0.01233, audio_tagging_loss=0.008692, over 3033546.37 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:19:17,921 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.071e+01 9.075e+01 9.531e+01 1.012e+02 1.694e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 07:19:21,908 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.68 vs. limit=10.0 2023-11-28 07:19:27,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3411406.6666666665, ans=0.125 2023-11-28 07:19:34,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3411473.3333333335, ans=0.125 2023-11-28 07:19:41,622 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2023-11-28 07:19:56,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3411606.6666666665, ans=0.125 2023-11-28 07:20:03,893 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511750 2023-11-28 07:20:07,125 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6750, loss[loss=0.07632, simple_loss=0.1038, pruned_loss=0.0136, audio_tagging_loss=0.01083, over 15575.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.0894, pruned_loss=0.01226, audio_tagging_loss=0.008714, over 3030828.09 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:20:13,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3411673.3333333335, ans=0.0 2023-11-28 07:20:19,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3411740.0, ans=0.125 2023-11-28 07:20:24,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3411740.0, ans=0.125 2023-11-28 07:20:37,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3411806.6666666665, ans=0.125 2023-11-28 07:20:46,196 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:20:48,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3411873.3333333335, ans=0.125 2023-11-28 07:20:55,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3411940.0, ans=0.0 2023-11-28 07:20:57,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3411940.0, ans=0.125 2023-11-28 07:21:00,803 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511800 2023-11-28 07:21:04,736 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6800, loss[loss=0.05203, simple_loss=0.06733, pruned_loss=0.007896, audio_tagging_loss=0.01047, over 14846.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08976, pruned_loss=0.01248, audio_tagging_loss=0.008695, over 3031199.50 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:21:08,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3412006.6666666665, ans=0.125 2023-11-28 07:21:12,434 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 8.904e+01 9.309e+01 9.890e+01 1.281e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-28 07:21:24,924 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.85 vs. limit=15.0 2023-11-28 07:21:36,948 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.65 vs. limit=12.0 2023-11-28 07:21:37,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3412206.6666666665, ans=0.125 2023-11-28 07:21:58,882 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511850 2023-11-28 07:22:02,592 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6850, loss[loss=0.06135, simple_loss=0.08787, pruned_loss=0.009129, audio_tagging_loss=0.008286, over 14974.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09109, pruned_loss=0.01263, audio_tagging_loss=0.008562, over 3045274.52 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:22:06,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3412340.0, ans=0.0 2023-11-28 07:22:13,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3412406.6666666665, ans=0.125 2023-11-28 07:22:20,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3412406.6666666665, ans=0.125 2023-11-28 07:22:49,274 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2023-11-28 07:22:56,311 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511900 2023-11-28 07:22:56,804 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.65 vs. limit=15.0 2023-11-28 07:22:59,473 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6900, loss[loss=0.06609, simple_loss=0.099, pruned_loss=0.009073, audio_tagging_loss=0.007518, over 16040.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09017, pruned_loss=0.01242, audio_tagging_loss=0.008667, over 3047894.33 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:23:03,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3412673.3333333335, ans=0.1 2023-11-28 07:23:05,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3412673.3333333335, ans=0.125 2023-11-28 07:23:07,194 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.647e+01 8.771e+01 9.385e+01 1.023e+02 1.493e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 07:23:08,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3412673.3333333335, ans=0.125 2023-11-28 07:23:41,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3412873.3333333335, ans=0.125 2023-11-28 07:23:48,860 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 07:23:53,300 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511950 2023-11-28 07:23:54,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=3412940.0, ans=0.2 2023-11-28 07:23:57,077 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6950, loss[loss=0.06136, simple_loss=0.08224, pruned_loss=0.01295, audio_tagging_loss=0.007291, over 14913.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08932, pruned_loss=0.01217, audio_tagging_loss=0.008717, over 3043734.70 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:24:05,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3413006.6666666665, ans=0.0 2023-11-28 07:24:05,880 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2023-11-28 07:24:08,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3413073.3333333335, ans=0.0 2023-11-28 07:24:12,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3413073.3333333335, ans=0.125 2023-11-28 07:24:17,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3413073.3333333335, ans=0.025 2023-11-28 07:24:18,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3413073.3333333335, ans=0.125 2023-11-28 07:24:22,731 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.552e-03 2023-11-28 07:24:24,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3413140.0, ans=0.125 2023-11-28 07:24:36,115 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.46 vs. limit=15.0 2023-11-28 07:24:39,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3413206.6666666665, ans=0.2 2023-11-28 07:24:51,246 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512000 2023-11-28 07:24:52,597 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-512000.pt 2023-11-28 07:24:57,487 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7000, loss[loss=0.08635, simple_loss=0.1212, pruned_loss=0.01667, audio_tagging_loss=0.009063, over 14777.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08976, pruned_loss=0.01235, audio_tagging_loss=0.008726, over 3042445.10 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:25:06,162 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.579e+01 9.421e+01 1.029e+02 1.258e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 07:25:10,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3413406.6666666665, ans=0.1 2023-11-28 07:25:12,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.84 vs. limit=10.0 2023-11-28 07:25:14,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3413406.6666666665, ans=0.0 2023-11-28 07:25:14,735 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-28 07:25:28,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3413473.3333333335, ans=0.125 2023-11-28 07:25:35,452 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.24 vs. limit=15.0 2023-11-28 07:25:41,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3413540.0, ans=0.125 2023-11-28 07:25:45,873 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.30 vs. limit=22.5 2023-11-28 07:25:50,605 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512050 2023-11-28 07:25:53,850 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7050, loss[loss=0.04977, simple_loss=0.06768, pruned_loss=0.007617, audio_tagging_loss=0.008311, over 14893.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09007, pruned_loss=0.01248, audio_tagging_loss=0.008762, over 3042738.88 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:26:10,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3413740.0, ans=0.125 2023-11-28 07:26:21,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3413806.6666666665, ans=0.0 2023-11-28 07:26:30,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3413873.3333333335, ans=0.125 2023-11-28 07:26:34,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3413873.3333333335, ans=0.125 2023-11-28 07:26:43,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3413940.0, ans=0.125 2023-11-28 07:26:44,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3413940.0, ans=0.125 2023-11-28 07:26:45,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.39 vs. limit=22.5 2023-11-28 07:26:46,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3413940.0, ans=0.2 2023-11-28 07:26:46,979 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512100 2023-11-28 07:26:50,151 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7100, loss[loss=0.0774, simple_loss=0.1066, pruned_loss=0.01536, audio_tagging_loss=0.008752, over 15277.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09035, pruned_loss=0.01256, audio_tagging_loss=0.008762, over 3036064.21 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:27:01,321 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.269e+01 9.094e+01 9.538e+01 1.011e+02 1.389e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 07:27:06,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3414073.3333333335, ans=0.0 2023-11-28 07:27:44,902 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512150 2023-11-28 07:27:45,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3414273.3333333335, ans=0.125 2023-11-28 07:27:48,106 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7150, loss[loss=0.0687, simple_loss=0.09197, pruned_loss=0.0124, audio_tagging_loss=0.01032, over 14121.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09039, pruned_loss=0.01248, audio_tagging_loss=0.008869, over 3045118.00 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 4.0 2023-11-28 07:28:07,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3414406.6666666665, ans=0.2 2023-11-28 07:28:22,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3414540.0, ans=0.125 2023-11-28 07:28:38,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3414606.6666666665, ans=0.125 2023-11-28 07:28:41,976 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512200 2023-11-28 07:28:43,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3414606.6666666665, ans=0.0 2023-11-28 07:28:45,577 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7200, loss[loss=0.0522, simple_loss=0.07076, pruned_loss=0.005778, audio_tagging_loss=0.01104, over 15192.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08952, pruned_loss=0.01235, audio_tagging_loss=0.008926, over 3042459.90 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:28:51,769 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.84 vs. limit=22.5 2023-11-28 07:28:56,392 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.923e+01 8.861e+01 9.668e+01 1.042e+02 2.032e+02, threshold=1.934e+02, percent-clipped=1.0 2023-11-28 07:29:05,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3414740.0, ans=0.1 2023-11-28 07:29:11,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3414806.6666666665, ans=0.0 2023-11-28 07:29:16,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3414806.6666666665, ans=0.0 2023-11-28 07:29:16,867 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=22.5 2023-11-28 07:29:23,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3414873.3333333335, ans=0.1 2023-11-28 07:29:26,993 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:29:38,931 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512250 2023-11-28 07:29:41,585 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2023-11-28 07:29:42,138 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7250, loss[loss=0.07857, simple_loss=0.108, pruned_loss=0.01551, audio_tagging_loss=0.009066, over 16189.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09005, pruned_loss=0.01225, audio_tagging_loss=0.00892, over 3047419.96 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:29:52,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3415073.3333333335, ans=0.0 2023-11-28 07:29:54,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3415073.3333333335, ans=0.1 2023-11-28 07:30:09,562 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2023-11-28 07:30:11,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.73 vs. limit=15.0 2023-11-28 07:30:22,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=3415206.6666666665, ans=0.1 2023-11-28 07:30:26,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3415273.3333333335, ans=0.07 2023-11-28 07:30:36,114 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512300 2023-11-28 07:30:37,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3415273.3333333335, ans=0.125 2023-11-28 07:30:38,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3415340.0, ans=0.0 2023-11-28 07:30:39,340 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7300, loss[loss=0.06177, simple_loss=0.07783, pruned_loss=0.01278, audio_tagging_loss=0.01007, over 14911.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09036, pruned_loss=0.01242, audio_tagging_loss=0.008831, over 3043746.20 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:30:51,223 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.861e+01 9.411e+01 1.033e+02 1.259e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 07:31:02,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3415473.3333333335, ans=0.0 2023-11-28 07:31:03,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3415473.3333333335, ans=0.125 2023-11-28 07:31:27,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3415606.6666666665, ans=0.125 2023-11-28 07:31:33,724 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512350 2023-11-28 07:31:36,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3415673.3333333335, ans=0.0 2023-11-28 07:31:37,005 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7350, loss[loss=0.05438, simple_loss=0.07043, pruned_loss=0.0103, audio_tagging_loss=0.008867, over 14835.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.0913, pruned_loss=0.01261, audio_tagging_loss=0.008686, over 3045957.49 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:31:43,101 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.37 vs. limit=15.0 2023-11-28 07:32:03,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3415806.6666666665, ans=0.125 2023-11-28 07:32:06,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3415806.6666666665, ans=0.0 2023-11-28 07:32:12,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3415873.3333333335, ans=0.0 2023-11-28 07:32:18,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2023-11-28 07:32:29,956 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512400 2023-11-28 07:32:33,364 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7400, loss[loss=0.05631, simple_loss=0.07589, pruned_loss=0.01029, audio_tagging_loss=0.008072, over 15216.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09027, pruned_loss=0.01243, audio_tagging_loss=0.008586, over 3047083.98 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:32:44,952 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.822e+01 9.327e+01 1.016e+02 1.231e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 07:32:55,888 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2023-11-28 07:32:57,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3416140.0, ans=0.0 2023-11-28 07:33:27,692 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512450 2023-11-28 07:33:30,877 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7450, loss[loss=0.06333, simple_loss=0.0894, pruned_loss=0.01007, audio_tagging_loss=0.00856, over 14827.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.09003, pruned_loss=0.01235, audio_tagging_loss=0.00856, over 3050954.17 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:33:40,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3416340.0, ans=0.0 2023-11-28 07:34:00,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3416473.3333333335, ans=0.125 2023-11-28 07:34:22,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3416606.6666666665, ans=0.0 2023-11-28 07:34:26,036 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512500 2023-11-28 07:34:26,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3416606.6666666665, ans=0.0 2023-11-28 07:34:26,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.94 vs. limit=15.0 2023-11-28 07:34:29,284 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7500, loss[loss=0.06601, simple_loss=0.09992, pruned_loss=0.01081, audio_tagging_loss=0.005238, over 16777.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09016, pruned_loss=0.01228, audio_tagging_loss=0.008605, over 3058055.31 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:34:37,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3416673.3333333335, ans=0.2 2023-11-28 07:34:40,254 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.348e+01 8.775e+01 9.275e+01 9.988e+01 1.436e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-28 07:35:02,243 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.88 vs. limit=15.0 2023-11-28 07:35:22,862 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512550 2023-11-28 07:35:26,339 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7550, loss[loss=0.07788, simple_loss=0.1081, pruned_loss=0.01681, audio_tagging_loss=0.007013, over 14483.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09025, pruned_loss=0.01226, audio_tagging_loss=0.008547, over 3061310.72 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:35:31,303 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.48 vs. limit=22.5 2023-11-28 07:35:48,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3417140.0, ans=0.125 2023-11-28 07:35:52,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3417140.0, ans=0.0 2023-11-28 07:36:00,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3417206.6666666665, ans=0.0 2023-11-28 07:36:20,881 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512600 2023-11-28 07:36:25,123 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7600, loss[loss=0.04954, simple_loss=0.06746, pruned_loss=0.007589, audio_tagging_loss=0.00822, over 15462.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.0898, pruned_loss=0.01225, audio_tagging_loss=0.008534, over 3058913.01 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:36:36,985 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.736e+01 9.227e+01 9.964e+01 1.331e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-28 07:36:46,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3417406.6666666665, ans=0.1 2023-11-28 07:37:19,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.38 vs. limit=12.0 2023-11-28 07:37:20,208 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512650 2023-11-28 07:37:22,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3417673.3333333335, ans=0.125 2023-11-28 07:37:23,483 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7650, loss[loss=0.05619, simple_loss=0.07767, pruned_loss=0.009939, audio_tagging_loss=0.007417, over 15174.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08897, pruned_loss=0.0121, audio_tagging_loss=0.008545, over 3056319.85 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:37:31,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3417673.3333333335, ans=0.2 2023-11-28 07:37:38,527 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.00 vs. limit=15.0 2023-11-28 07:37:46,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3417806.6666666665, ans=0.2 2023-11-28 07:37:47,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3417806.6666666665, ans=0.125 2023-11-28 07:37:49,266 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0 2023-11-28 07:38:07,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3417873.3333333335, ans=0.125 2023-11-28 07:38:11,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3417940.0, ans=0.0 2023-11-28 07:38:18,878 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512700 2023-11-28 07:38:22,138 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7700, loss[loss=0.05156, simple_loss=0.07395, pruned_loss=0.004364, audio_tagging_loss=0.01022, over 14415.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08895, pruned_loss=0.01216, audio_tagging_loss=0.008576, over 3047071.00 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:38:31,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3418006.6666666665, ans=0.125 2023-11-28 07:38:31,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3418006.6666666665, ans=0.2 2023-11-28 07:38:34,286 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.901e+01 9.400e+01 1.006e+02 1.251e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-28 07:38:34,974 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.80 vs. limit=15.0 2023-11-28 07:39:52,529 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512750 2023-11-28 07:40:00,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3418340.0, ans=0.2 2023-11-28 07:40:09,631 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7750, loss[loss=0.05072, simple_loss=0.06187, pruned_loss=0.009444, audio_tagging_loss=0.01034, over 14714.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08864, pruned_loss=0.01219, audio_tagging_loss=0.008702, over 3042533.02 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:40:40,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3418340.0, ans=0.5 2023-11-28 07:41:14,751 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.88 vs. limit=15.0 2023-11-28 07:41:41,269 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:42:42,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3418540.0, ans=0.125 2023-11-28 07:43:33,598 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512800 2023-11-28 07:43:34,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3418606.6666666665, ans=0.0 2023-11-28 07:43:46,635 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7800, loss[loss=0.06637, simple_loss=0.08926, pruned_loss=0.01325, audio_tagging_loss=0.008491, over 14953.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08909, pruned_loss=0.0125, audio_tagging_loss=0.008703, over 3045459.89 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:44:24,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3418740.0, ans=0.0 2023-11-28 07:44:31,138 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 8.859e+01 9.420e+01 1.032e+02 1.560e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 07:45:40,212 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.97 vs. limit=15.0 2023-11-28 07:45:47,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3418873.3333333335, ans=0.2 2023-11-28 07:45:55,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3418873.3333333335, ans=0.2 2023-11-28 07:46:49,112 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512850 2023-11-28 07:47:06,838 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7850, loss[loss=0.0684, simple_loss=0.0877, pruned_loss=0.01538, audio_tagging_loss=0.009163, over 15276.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08932, pruned_loss=0.01265, audio_tagging_loss=0.008796, over 3045210.77 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:47:37,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3419006.6666666665, ans=0.0 2023-11-28 07:50:15,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3419273.3333333335, ans=0.0 2023-11-28 07:50:33,370 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512900 2023-11-28 07:50:44,787 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7900, loss[loss=0.04965, simple_loss=0.06682, pruned_loss=0.006775, audio_tagging_loss=0.009469, over 14457.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09005, pruned_loss=0.01268, audio_tagging_loss=0.008748, over 3050097.62 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:50:45,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3419340.0, ans=0.0 2023-11-28 07:51:03,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3419340.0, ans=0.125 2023-11-28 07:51:03,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3419340.0, ans=0.1 2023-11-28 07:51:25,724 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.969e+01 8.861e+01 9.655e+01 1.039e+02 1.530e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-28 07:52:01,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3419473.3333333335, ans=0.125 2023-11-28 07:52:06,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3419473.3333333335, ans=0.0 2023-11-28 07:52:49,525 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.77 vs. limit=15.0 2023-11-28 07:52:57,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3419540.0, ans=0.0 2023-11-28 07:53:03,381 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2023-11-28 07:53:06,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3419540.0, ans=0.0 2023-11-28 07:53:07,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=3419540.0, ans=10.0 2023-11-28 07:53:42,911 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512950 2023-11-28 07:53:53,384 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7950, loss[loss=0.05484, simple_loss=0.07368, pruned_loss=0.008771, audio_tagging_loss=0.009224, over 14499.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09018, pruned_loss=0.01261, audio_tagging_loss=0.00878, over 3055410.88 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:55:03,290 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 07:55:25,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3419806.6666666665, ans=0.2 2023-11-28 07:55:26,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3419806.6666666665, ans=0.125 2023-11-28 07:57:25,326 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513000 2023-11-28 07:57:41,063 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8000, loss[loss=0.06969, simple_loss=0.09678, pruned_loss=0.01236, audio_tagging_loss=0.008941, over 14128.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08984, pruned_loss=0.01257, audio_tagging_loss=0.008908, over 3047926.38 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:58:17,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3420006.6666666665, ans=0.125 2023-11-28 07:58:26,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3420073.3333333335, ans=0.95 2023-11-28 07:58:27,804 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.711e+01 9.409e+01 1.028e+02 1.220e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 07:59:18,294 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2023-11-28 08:00:43,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3420273.3333333335, ans=15.0 2023-11-28 08:00:46,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3420273.3333333335, ans=0.95 2023-11-28 08:01:08,194 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513050 2023-11-28 08:01:19,890 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8050, loss[loss=0.08592, simple_loss=0.12, pruned_loss=0.01883, audio_tagging_loss=0.007102, over 14628.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08867, pruned_loss=0.01243, audio_tagging_loss=0.009102, over 3044658.24 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:03:01,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3420540.0, ans=0.2 2023-11-28 08:03:48,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.10 vs. limit=12.0 2023-11-28 08:04:02,203 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513100 2023-11-28 08:04:11,739 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8100, loss[loss=0.09483, simple_loss=0.1326, pruned_loss=0.02098, audio_tagging_loss=0.007562, over 16554.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.08972, pruned_loss=0.01257, audio_tagging_loss=0.00905, over 3043147.79 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:04:51,353 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.964e+01 9.574e+01 1.024e+02 1.325e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 08:05:00,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3420740.0, ans=0.2 2023-11-28 08:06:54,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3420940.0, ans=0.125 2023-11-28 08:07:02,709 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513150 2023-11-28 08:07:11,901 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8150, loss[loss=0.06018, simple_loss=0.07829, pruned_loss=0.0117, audio_tagging_loss=0.009326, over 14248.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09006, pruned_loss=0.01268, audio_tagging_loss=0.008824, over 3039935.47 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:08:10,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3421073.3333333335, ans=0.2 2023-11-28 08:08:19,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3421140.0, ans=0.125 2023-11-28 08:10:00,211 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513200 2023-11-28 08:10:04,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3421273.3333333335, ans=0.025 2023-11-28 08:10:09,391 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8200, loss[loss=0.05529, simple_loss=0.06692, pruned_loss=0.0135, audio_tagging_loss=0.008326, over 14600.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.08981, pruned_loss=0.01259, audio_tagging_loss=0.008739, over 3043051.69 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:10:21,024 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 08:10:48,269 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.547e+01 8.671e+01 9.315e+01 1.033e+02 1.596e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-28 08:10:59,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3421406.6666666665, ans=0.2 2023-11-28 08:11:53,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3421540.0, ans=0.125 2023-11-28 08:12:39,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3421606.6666666665, ans=0.5 2023-11-28 08:12:53,573 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513250 2023-11-28 08:13:04,989 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8250, loss[loss=0.07852, simple_loss=0.1121, pruned_loss=0.01537, audio_tagging_loss=0.007108, over 15802.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08953, pruned_loss=0.01235, audio_tagging_loss=0.008639, over 3044838.07 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:13:11,492 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.17 vs. limit=10.0 2023-11-28 08:14:03,906 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.21 vs. limit=10.0 2023-11-28 08:14:38,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3421873.3333333335, ans=0.05 2023-11-28 08:15:32,364 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2023-11-28 08:15:55,222 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513300 2023-11-28 08:16:08,862 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8300, loss[loss=0.06543, simple_loss=0.09386, pruned_loss=0.01133, audio_tagging_loss=0.00717, over 15380.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08889, pruned_loss=0.01234, audio_tagging_loss=0.008563, over 3047574.04 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:16:49,487 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.832e+01 9.492e+01 1.019e+02 1.242e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 08:17:36,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3422140.0, ans=0.2 2023-11-28 08:17:56,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3422140.0, ans=0.0 2023-11-28 08:17:56,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3422140.0, ans=0.125 2023-11-28 08:18:03,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3422206.6666666665, ans=0.125 2023-11-28 08:18:54,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3422273.3333333335, ans=0.125 2023-11-28 08:19:10,797 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513350 2023-11-28 08:19:21,113 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8350, loss[loss=0.07635, simple_loss=0.104, pruned_loss=0.01465, audio_tagging_loss=0.009716, over 16496.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08959, pruned_loss=0.01243, audio_tagging_loss=0.008554, over 3058441.65 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:19:31,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3422340.0, ans=0.2 2023-11-28 08:19:48,610 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.19 vs. limit=15.0 2023-11-28 08:20:02,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3422406.6666666665, ans=10.0 2023-11-28 08:20:21,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3422406.6666666665, ans=0.0 2023-11-28 08:20:35,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3422473.3333333335, ans=0.0 2023-11-28 08:21:56,066 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513400 2023-11-28 08:21:56,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3422606.6666666665, ans=0.1 2023-11-28 08:22:05,668 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8400, loss[loss=0.07963, simple_loss=0.1184, pruned_loss=0.01315, audio_tagging_loss=0.007257, over 14899.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09006, pruned_loss=0.01257, audio_tagging_loss=0.008526, over 3058178.41 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:22:09,520 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:22:22,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3422673.3333333335, ans=0.0 2023-11-28 08:22:22,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3422673.3333333335, ans=0.0 2023-11-28 08:22:35,725 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.870e+01 9.331e+01 1.011e+02 1.281e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-28 08:22:43,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3422740.0, ans=0.125 2023-11-28 08:23:00,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3422806.6666666665, ans=0.125 2023-11-28 08:23:00,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3422806.6666666665, ans=0.0 2023-11-28 08:23:01,576 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.16 vs. limit=10.0 2023-11-28 08:23:32,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2023-11-28 08:24:19,641 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513450 2023-11-28 08:24:26,997 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8450, loss[loss=0.07999, simple_loss=0.1024, pruned_loss=0.01682, audio_tagging_loss=0.01194, over 15163.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09013, pruned_loss=0.01247, audio_tagging_loss=0.008604, over 3059083.74 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:24:31,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3423006.6666666665, ans=0.125 2023-11-28 08:24:50,848 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.72 vs. limit=22.5 2023-11-28 08:24:52,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3423073.3333333335, ans=0.0 2023-11-28 08:25:14,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3423140.0, ans=0.125 2023-11-28 08:25:17,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3423140.0, ans=0.1 2023-11-28 08:26:11,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3423273.3333333335, ans=0.1 2023-11-28 08:26:29,924 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513500 2023-11-28 08:26:35,555 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8500, loss[loss=0.05523, simple_loss=0.07627, pruned_loss=0.007852, audio_tagging_loss=0.009241, over 14280.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09048, pruned_loss=0.01241, audio_tagging_loss=0.008676, over 3058194.95 frames. ], batch size: 52, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:27:06,579 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.889e+01 8.890e+01 9.437e+01 1.019e+02 2.913e+02, threshold=1.887e+02, percent-clipped=1.0 2023-11-28 08:27:08,109 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.72 vs. limit=22.5 2023-11-28 08:27:10,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3423406.6666666665, ans=0.0 2023-11-28 08:27:19,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3423406.6666666665, ans=0.125 2023-11-28 08:27:31,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3423473.3333333335, ans=0.0 2023-11-28 08:28:21,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3423606.6666666665, ans=0.125 2023-11-28 08:28:35,615 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.28 vs. limit=10.0 2023-11-28 08:28:36,492 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513550 2023-11-28 08:28:37,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3423606.6666666665, ans=0.04949747468305833 2023-11-28 08:28:44,907 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8550, loss[loss=0.0642, simple_loss=0.08518, pruned_loss=0.01405, audio_tagging_loss=0.007563, over 14821.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09051, pruned_loss=0.01236, audio_tagging_loss=0.008676, over 3046795.20 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:29:32,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3423806.6666666665, ans=0.125 2023-11-28 08:30:04,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3423873.3333333335, ans=0.125 2023-11-28 08:30:11,048 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.75 vs. limit=22.5 2023-11-28 08:30:30,682 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513600 2023-11-28 08:30:36,085 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:30:37,847 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8600, loss[loss=0.07057, simple_loss=0.09976, pruned_loss=0.01359, audio_tagging_loss=0.007105, over 15773.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08963, pruned_loss=0.01235, audio_tagging_loss=0.008694, over 3038520.64 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:30:57,304 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.252e+01 8.892e+01 9.588e+01 1.028e+02 1.351e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 08:31:26,688 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.17 vs. limit=22.5 2023-11-28 08:32:09,724 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513650 2023-11-28 08:32:14,161 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8650, loss[loss=0.07038, simple_loss=0.09777, pruned_loss=0.01191, audio_tagging_loss=0.009589, over 16266.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09, pruned_loss=0.01236, audio_tagging_loss=0.008717, over 3040588.70 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:32:25,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.70 vs. limit=15.0 2023-11-28 08:32:56,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3424473.3333333335, ans=0.0 2023-11-28 08:33:12,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3424540.0, ans=0.125 2023-11-28 08:33:28,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3424540.0, ans=0.0 2023-11-28 08:33:45,951 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513700 2023-11-28 08:33:50,735 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8700, loss[loss=0.06179, simple_loss=0.08707, pruned_loss=0.01257, audio_tagging_loss=0.005691, over 15114.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08912, pruned_loss=0.01231, audio_tagging_loss=0.008795, over 3040132.58 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:33:56,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3424673.3333333335, ans=0.125 2023-11-28 08:34:13,094 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.836e+01 9.429e+01 1.013e+02 1.223e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 08:34:20,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3424740.0, ans=0.1 2023-11-28 08:34:54,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3424873.3333333335, ans=0.125 2023-11-28 08:35:10,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3424940.0, ans=0.1 2023-11-28 08:35:15,408 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513750 2023-11-28 08:35:20,388 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8750, loss[loss=0.08826, simple_loss=0.1212, pruned_loss=0.01948, audio_tagging_loss=0.008178, over 14818.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09013, pruned_loss=0.01248, audio_tagging_loss=0.008814, over 3043682.33 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:35:47,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3425073.3333333335, ans=0.0 2023-11-28 08:35:50,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3425073.3333333335, ans=0.2 2023-11-28 08:36:11,450 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=15.0 2023-11-28 08:36:39,571 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:36:48,752 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513800 2023-11-28 08:36:54,501 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8800, loss[loss=0.05678, simple_loss=0.07906, pruned_loss=0.0084, audio_tagging_loss=0.008851, over 15373.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09064, pruned_loss=0.01238, audio_tagging_loss=0.008753, over 3047950.24 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:37:13,360 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.671e+01 8.831e+01 9.235e+01 9.998e+01 1.254e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-28 08:37:23,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3425406.6666666665, ans=0.05 2023-11-28 08:37:29,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3425473.3333333335, ans=0.2 2023-11-28 08:37:30,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3425473.3333333335, ans=0.0 2023-11-28 08:37:35,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3425473.3333333335, ans=0.0 2023-11-28 08:37:37,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3425473.3333333335, ans=0.125 2023-11-28 08:37:56,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3425540.0, ans=0.035 2023-11-28 08:37:56,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3425540.0, ans=0.125 2023-11-28 08:38:10,658 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513850 2023-11-28 08:38:15,023 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8850, loss[loss=0.06756, simple_loss=0.09753, pruned_loss=0.01059, audio_tagging_loss=0.008203, over 14746.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.0901, pruned_loss=0.01234, audio_tagging_loss=0.008786, over 3046133.24 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:38:36,632 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 08:38:56,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3425806.6666666665, ans=22.5 2023-11-28 08:39:00,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3425806.6666666665, ans=0.125 2023-11-28 08:39:05,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.36 vs. limit=22.5 2023-11-28 08:39:31,376 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513900 2023-11-28 08:39:36,064 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8900, loss[loss=0.0452, simple_loss=0.05661, pruned_loss=0.008411, audio_tagging_loss=0.008483, over 15008.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09018, pruned_loss=0.01221, audio_tagging_loss=0.008756, over 3053407.96 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:39:42,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3426006.6666666665, ans=0.1 2023-11-28 08:39:45,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3426006.6666666665, ans=0.125 2023-11-28 08:39:57,374 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.695e+01 8.722e+01 9.445e+01 1.012e+02 1.187e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 08:40:43,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3426273.3333333335, ans=0.05 2023-11-28 08:40:46,786 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513950 2023-11-28 08:40:50,600 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8950, loss[loss=0.07708, simple_loss=0.1065, pruned_loss=0.01802, audio_tagging_loss=0.005823, over 15161.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09034, pruned_loss=0.01216, audio_tagging_loss=0.008588, over 3047807.19 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 08:41:01,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3426340.0, ans=0.1 2023-11-28 08:41:08,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.20 vs. limit=15.0 2023-11-28 08:41:20,360 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=15.0 2023-11-28 08:41:29,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3426540.0, ans=0.125 2023-11-28 08:41:32,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3426540.0, ans=0.125 2023-11-28 08:41:52,970 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514000 2023-11-28 08:41:57,096 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9000, loss[loss=0.06585, simple_loss=0.08668, pruned_loss=0.0128, audio_tagging_loss=0.009706, over 14837.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09087, pruned_loss=0.01231, audio_tagging_loss=0.00848, over 3051218.36 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 08:41:57,099 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 08:42:29,432 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.8481, 3.0076, 2.8168, 2.7466, 3.3456, 3.3636, 3.2039, 3.6203], device='cuda:0') 2023-11-28 08:42:35,462 INFO [train_asr.py:1267] (0/4) Epoch 43, validation: loss=0.05867, simple_loss=0.05056, pruned_loss=0.005241, audio_tagging_loss=0.02815, over 4681554.00 frames. 2023-11-28 08:42:35,463 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 08:42:43,951 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=22.5 2023-11-28 08:42:50,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3426740.0, ans=0.125 2023-11-28 08:42:53,916 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 8.891e+01 9.730e+01 1.046e+02 2.169e+02, threshold=1.946e+02, percent-clipped=1.0 2023-11-28 08:43:07,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3426806.6666666665, ans=0.125 2023-11-28 08:43:26,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3426940.0, ans=0.2 2023-11-28 08:43:36,144 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514050 2023-11-28 08:43:37,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3426940.0, ans=0.0 2023-11-28 08:43:40,624 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9050, loss[loss=0.07375, simple_loss=0.1006, pruned_loss=0.01356, audio_tagging_loss=0.009868, over 15583.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09096, pruned_loss=0.01243, audio_tagging_loss=0.008471, over 3046293.90 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 08:43:50,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3427006.6666666665, ans=0.0 2023-11-28 08:44:07,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.85 vs. limit=22.5 2023-11-28 08:44:22,784 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2023-11-28 08:44:29,474 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-11-28 08:44:34,367 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.55 vs. limit=22.5 2023-11-28 08:44:39,507 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514100 2023-11-28 08:44:43,137 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9100, loss[loss=0.05569, simple_loss=0.08631, pruned_loss=0.00855, audio_tagging_loss=0.003987, over 15091.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09028, pruned_loss=0.0123, audio_tagging_loss=0.008404, over 3044644.97 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 08:44:51,883 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.26 vs. limit=15.0 2023-11-28 08:44:58,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3427406.6666666665, ans=0.125 2023-11-28 08:45:01,265 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 9.021e+01 9.381e+01 1.003e+02 1.228e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 08:45:03,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3427406.6666666665, ans=0.2 2023-11-28 08:45:07,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3427473.3333333335, ans=0.1 2023-11-28 08:45:25,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3427540.0, ans=0.125 2023-11-28 08:45:32,798 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2023-11-28 08:45:40,621 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514150 2023-11-28 08:45:44,482 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9150, loss[loss=0.06041, simple_loss=0.07708, pruned_loss=0.008459, audio_tagging_loss=0.01341, over 14971.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08956, pruned_loss=0.01238, audio_tagging_loss=0.008533, over 3042120.73 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 08:45:51,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3427673.3333333335, ans=15.0 2023-11-28 08:45:51,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3427673.3333333335, ans=0.0 2023-11-28 08:46:04,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3427740.0, ans=0.125 2023-11-28 08:46:24,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3427873.3333333335, ans=0.0 2023-11-28 08:46:25,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3427873.3333333335, ans=0.125 2023-11-28 08:46:33,129 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.43 vs. limit=12.0 2023-11-28 08:46:39,299 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514200 2023-11-28 08:46:42,866 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9200, loss[loss=0.07877, simple_loss=0.1095, pruned_loss=0.01413, audio_tagging_loss=0.009879, over 15266.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08936, pruned_loss=0.01238, audio_tagging_loss=0.008613, over 3045800.33 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:46:51,153 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:46:55,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.52 vs. limit=15.0 2023-11-28 08:46:58,826 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.290e+01 8.605e+01 9.339e+01 9.879e+01 1.258e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 08:47:00,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3428073.3333333335, ans=0.2 2023-11-28 08:47:37,018 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514250 2023-11-28 08:47:40,307 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9250, loss[loss=0.0903, simple_loss=0.1358, pruned_loss=0.0157, audio_tagging_loss=0.006693, over 15839.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08852, pruned_loss=0.01226, audio_tagging_loss=0.008701, over 3045631.37 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:48:08,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3428473.3333333335, ans=0.125 2023-11-28 08:48:28,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3428606.6666666665, ans=0.125 2023-11-28 08:48:34,865 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514300 2023-11-28 08:48:38,059 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9300, loss[loss=0.08867, simple_loss=0.1268, pruned_loss=0.01899, audio_tagging_loss=0.006296, over 15111.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08909, pruned_loss=0.01236, audio_tagging_loss=0.008715, over 3048342.85 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:48:54,142 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.614e+01 9.246e+01 9.788e+01 1.593e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-28 08:49:28,144 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:49:32,186 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514350 2023-11-28 08:49:32,724 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=15.0 2023-11-28 08:49:35,352 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9350, loss[loss=0.07737, simple_loss=0.1026, pruned_loss=0.0183, audio_tagging_loss=0.007762, over 14499.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08937, pruned_loss=0.01243, audio_tagging_loss=0.008748, over 3046043.27 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:49:35,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3429006.6666666665, ans=0.125 2023-11-28 08:49:36,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3429006.6666666665, ans=0.0 2023-11-28 08:49:41,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3429006.6666666665, ans=0.125 2023-11-28 08:49:47,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3429073.3333333335, ans=0.04949747468305833 2023-11-28 08:49:54,772 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:50:00,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3429140.0, ans=0.125 2023-11-28 08:50:00,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3429140.0, ans=0.04949747468305833 2023-11-28 08:50:04,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3429140.0, ans=0.125 2023-11-28 08:50:12,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3429206.6666666665, ans=0.125 2023-11-28 08:50:21,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3429273.3333333335, ans=0.125 2023-11-28 08:50:23,251 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.48 vs. limit=15.0 2023-11-28 08:50:28,896 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514400 2023-11-28 08:50:32,405 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9400, loss[loss=0.08415, simple_loss=0.1198, pruned_loss=0.01594, audio_tagging_loss=0.008315, over 16028.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08908, pruned_loss=0.01234, audio_tagging_loss=0.008824, over 3043517.87 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:50:47,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3429406.6666666665, ans=0.125 2023-11-28 08:50:48,715 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.388e+01 8.921e+01 9.569e+01 1.013e+02 1.910e+02, threshold=1.914e+02, percent-clipped=1.0 2023-11-28 08:50:49,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3429406.6666666665, ans=0.125 2023-11-28 08:50:53,598 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.50 vs. limit=22.5 2023-11-28 08:50:53,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.91 vs. limit=15.0 2023-11-28 08:50:54,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3429406.6666666665, ans=0.5 2023-11-28 08:51:11,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3429540.0, ans=0.1 2023-11-28 08:51:21,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3429606.6666666665, ans=0.2 2023-11-28 08:51:27,044 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514450 2023-11-28 08:51:30,082 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9450, loss[loss=0.06548, simple_loss=0.08666, pruned_loss=0.0126, audio_tagging_loss=0.009548, over 14371.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08961, pruned_loss=0.0123, audio_tagging_loss=0.008842, over 3045592.91 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:51:32,365 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 08:51:36,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3429673.3333333335, ans=0.0 2023-11-28 08:51:50,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3429740.0, ans=0.0 2023-11-28 08:51:51,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3429806.6666666665, ans=0.0 2023-11-28 08:52:10,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3429873.3333333335, ans=0.0 2023-11-28 08:52:23,966 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514500 2023-11-28 08:52:27,124 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9500, loss[loss=0.0656, simple_loss=0.09461, pruned_loss=0.008691, audio_tagging_loss=0.009607, over 14658.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08977, pruned_loss=0.01235, audio_tagging_loss=0.008947, over 3039740.03 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:52:27,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3430006.6666666665, ans=0.125 2023-11-28 08:52:41,581 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=15.0 2023-11-28 08:52:42,224 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 9.109e+01 9.581e+01 1.028e+02 2.016e+02, threshold=1.916e+02, percent-clipped=1.0 2023-11-28 08:52:53,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3430140.0, ans=0.0 2023-11-28 08:53:20,691 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514550 2023-11-28 08:53:23,771 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9550, loss[loss=0.05485, simple_loss=0.07856, pruned_loss=0.00694, audio_tagging_loss=0.008635, over 15102.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08873, pruned_loss=0.0121, audio_tagging_loss=0.008997, over 3038659.10 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:53:26,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3430340.0, ans=0.125 2023-11-28 08:53:29,669 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=15.0 2023-11-28 08:53:30,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3430340.0, ans=0.04949747468305833 2023-11-28 08:53:32,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3430340.0, ans=0.1 2023-11-28 08:53:57,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3430540.0, ans=0.125 2023-11-28 08:54:11,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3430606.6666666665, ans=0.125 2023-11-28 08:54:12,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3430606.6666666665, ans=0.125 2023-11-28 08:54:17,654 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514600 2023-11-28 08:54:21,034 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.09 vs. limit=10.0 2023-11-28 08:54:21,549 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9600, loss[loss=0.09684, simple_loss=0.13, pruned_loss=0.02363, audio_tagging_loss=0.008239, over 14191.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08929, pruned_loss=0.01233, audio_tagging_loss=0.009018, over 3041807.75 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:54:37,741 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.954e+01 8.927e+01 9.333e+01 1.014e+02 1.212e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-28 08:55:02,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3430873.3333333335, ans=0.125 2023-11-28 08:55:10,139 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.13 vs. limit=15.0 2023-11-28 08:55:16,152 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514650 2023-11-28 08:55:19,454 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9650, loss[loss=0.05745, simple_loss=0.07834, pruned_loss=0.009113, audio_tagging_loss=0.009169, over 15615.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08908, pruned_loss=0.01245, audio_tagging_loss=0.009029, over 3037602.49 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:55:21,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3431006.6666666665, ans=0.1 2023-11-28 08:56:02,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3431206.6666666665, ans=0.0 2023-11-28 08:56:13,092 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514700 2023-11-28 08:56:16,246 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9700, loss[loss=0.07714, simple_loss=0.1104, pruned_loss=0.01409, audio_tagging_loss=0.007868, over 14207.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08929, pruned_loss=0.01245, audio_tagging_loss=0.008856, over 3032792.85 frames. ], batch size: 52, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:56:16,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3431340.0, ans=0.125 2023-11-28 08:56:22,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3431340.0, ans=0.1 2023-11-28 08:56:32,358 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.192e+01 8.860e+01 9.541e+01 1.023e+02 1.192e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 08:56:35,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3431406.6666666665, ans=0.125 2023-11-28 08:56:35,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3431406.6666666665, ans=0.0 2023-11-28 08:56:46,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3431473.3333333335, ans=0.0 2023-11-28 08:56:57,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3431540.0, ans=0.125 2023-11-28 08:57:02,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3431606.6666666665, ans=0.125 2023-11-28 08:57:03,685 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=12.0 2023-11-28 08:57:09,649 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514750 2023-11-28 08:57:12,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3431673.3333333335, ans=0.0 2023-11-28 08:57:13,585 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9750, loss[loss=0.04776, simple_loss=0.06614, pruned_loss=0.00581, audio_tagging_loss=0.008875, over 13796.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08948, pruned_loss=0.01233, audio_tagging_loss=0.008653, over 3038226.26 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:57:18,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3431673.3333333335, ans=0.1 2023-11-28 08:57:19,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3431673.3333333335, ans=0.125 2023-11-28 08:57:28,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3431740.0, ans=0.125 2023-11-28 08:57:50,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3431873.3333333335, ans=0.125 2023-11-28 08:57:52,414 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:58:04,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=3431940.0, ans=6.0 2023-11-28 08:58:07,899 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514800 2023-11-28 08:58:08,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3431940.0, ans=0.2 2023-11-28 08:58:11,283 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9800, loss[loss=0.09275, simple_loss=0.1246, pruned_loss=0.02286, audio_tagging_loss=0.007601, over 14756.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08992, pruned_loss=0.01225, audio_tagging_loss=0.008645, over 3036413.99 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:58:19,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3432006.6666666665, ans=0.0 2023-11-28 08:58:22,837 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.65 vs. limit=15.0 2023-11-28 08:58:25,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3432073.3333333335, ans=0.2 2023-11-28 08:58:26,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3432073.3333333335, ans=0.025 2023-11-28 08:58:27,618 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.077e+01 8.900e+01 9.501e+01 1.026e+02 1.176e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 08:58:46,354 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:59:05,110 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514850 2023-11-28 08:59:05,274 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:59:05,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3432273.3333333335, ans=0.1 2023-11-28 08:59:06,139 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 08:59:08,269 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9850, loss[loss=0.06348, simple_loss=0.09343, pruned_loss=0.009255, audio_tagging_loss=0.007511, over 16720.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09095, pruned_loss=0.01237, audio_tagging_loss=0.008515, over 3040903.98 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:59:21,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3432406.6666666665, ans=0.0 2023-11-28 08:59:23,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3432406.6666666665, ans=0.125 2023-11-28 08:59:27,162 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.12 vs. limit=15.0 2023-11-28 08:59:31,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3432473.3333333335, ans=0.0 2023-11-28 08:59:38,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2023-11-28 08:59:40,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3432473.3333333335, ans=0.125 2023-11-28 08:59:47,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3432540.0, ans=0.0 2023-11-28 08:59:48,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3432540.0, ans=0.0 2023-11-28 08:59:53,325 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.67 vs. limit=15.0 2023-11-28 08:59:54,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3432606.6666666665, ans=0.125 2023-11-28 08:59:55,412 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.82 vs. limit=15.0 2023-11-28 09:00:01,451 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514900 2023-11-28 09:00:03,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3432673.3333333335, ans=0.1 2023-11-28 09:00:04,607 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9900, loss[loss=0.07376, simple_loss=0.1008, pruned_loss=0.0172, audio_tagging_loss=0.006138, over 14912.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09081, pruned_loss=0.0124, audio_tagging_loss=0.008492, over 3042481.43 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 09:00:04,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3432673.3333333335, ans=0.0 2023-11-28 09:00:23,100 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 8.866e+01 9.531e+01 1.026e+02 1.362e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 09:00:53,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3432940.0, ans=0.125 2023-11-28 09:00:59,152 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514950 2023-11-28 09:00:59,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3432940.0, ans=0.0 2023-11-28 09:01:03,288 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9950, loss[loss=0.06134, simple_loss=0.08704, pruned_loss=0.009266, audio_tagging_loss=0.008556, over 14751.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09071, pruned_loss=0.01233, audio_tagging_loss=0.008501, over 3040465.71 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:01:22,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=3433073.3333333335, ans=0.2 2023-11-28 09:01:27,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3433140.0, ans=0.0 2023-11-28 09:01:31,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3433140.0, ans=0.02 2023-11-28 09:01:52,379 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=15.0 2023-11-28 09:01:57,345 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515000 2023-11-28 09:02:00,832 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10000, loss[loss=0.06908, simple_loss=0.08734, pruned_loss=0.01854, audio_tagging_loss=0.006864, over 14405.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08988, pruned_loss=0.01225, audio_tagging_loss=0.008522, over 3044269.60 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:02:04,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3433340.0, ans=0.125 2023-11-28 09:02:13,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3433406.6666666665, ans=0.1 2023-11-28 09:02:15,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3433406.6666666665, ans=0.2 2023-11-28 09:02:18,950 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 8.838e+01 9.507e+01 1.055e+02 1.169e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 09:02:38,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3433540.0, ans=0.125 2023-11-28 09:02:44,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3433540.0, ans=0.0 2023-11-28 09:02:52,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3433606.6666666665, ans=0.125 2023-11-28 09:02:53,759 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2023-11-28 09:02:54,302 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515050 2023-11-28 09:02:54,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2023-11-28 09:02:57,630 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10050, loss[loss=0.08194, simple_loss=0.1219, pruned_loss=0.01483, audio_tagging_loss=0.006166, over 15912.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09011, pruned_loss=0.01228, audio_tagging_loss=0.008536, over 3043130.35 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:03:18,335 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=15.0 2023-11-28 09:03:37,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3433873.3333333335, ans=0.125 2023-11-28 09:03:51,613 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515100 2023-11-28 09:03:55,267 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10100, loss[loss=0.05831, simple_loss=0.08021, pruned_loss=0.009276, audio_tagging_loss=0.008932, over 14551.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08956, pruned_loss=0.01216, audio_tagging_loss=0.008597, over 3051414.56 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:04:05,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3434073.3333333335, ans=0.125 2023-11-28 09:04:13,640 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.780e+01 9.411e+01 9.939e+01 1.267e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 09:04:17,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3434140.0, ans=0.0 2023-11-28 09:04:22,823 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:04:36,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3434206.6666666665, ans=0.125 2023-11-28 09:04:36,380 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:04:43,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3434273.3333333335, ans=0.0 2023-11-28 09:04:45,963 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:04:46,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3434273.3333333335, ans=0.125 2023-11-28 09:04:49,328 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515150 2023-11-28 09:04:53,046 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10150, loss[loss=0.05209, simple_loss=0.06616, pruned_loss=0.008485, audio_tagging_loss=0.01052, over 15255.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08982, pruned_loss=0.01216, audio_tagging_loss=0.008637, over 3049790.98 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:04:57,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=3434340.0, ans=0.1 2023-11-28 09:04:57,921 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.83 vs. limit=15.0 2023-11-28 09:04:59,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3434340.0, ans=0.0 2023-11-28 09:05:23,669 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:05:28,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3434540.0, ans=0.0 2023-11-28 09:05:45,855 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515200 2023-11-28 09:05:48,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3434673.3333333335, ans=0.125 2023-11-28 09:05:49,223 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10200, loss[loss=0.07402, simple_loss=0.1022, pruned_loss=0.01493, audio_tagging_loss=0.007993, over 16233.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08903, pruned_loss=0.01209, audio_tagging_loss=0.008851, over 3040935.60 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:05:53,177 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=12.0 2023-11-28 09:05:59,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3434740.0, ans=0.125 2023-11-28 09:06:05,236 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.88 vs. limit=10.0 2023-11-28 09:06:08,200 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 8.883e+01 9.493e+01 1.013e+02 1.248e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 09:06:14,798 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:06:14,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3434806.6666666665, ans=0.04949747468305833 2023-11-28 09:06:15,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3434806.6666666665, ans=0.125 2023-11-28 09:06:28,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3434873.3333333335, ans=0.2 2023-11-28 09:06:37,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3434940.0, ans=0.125 2023-11-28 09:06:41,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3434940.0, ans=0.025 2023-11-28 09:06:43,210 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515250 2023-11-28 09:06:46,373 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10250, loss[loss=0.06504, simple_loss=0.09417, pruned_loss=0.007663, audio_tagging_loss=0.01029, over 15332.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08896, pruned_loss=0.01202, audio_tagging_loss=0.008913, over 3046173.23 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:06:55,112 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:06:55,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3435006.6666666665, ans=0.0 2023-11-28 09:07:36,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3435273.3333333335, ans=0.125 2023-11-28 09:07:38,877 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:07:40,788 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515300 2023-11-28 09:07:44,048 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10300, loss[loss=0.05869, simple_loss=0.08293, pruned_loss=0.008768, audio_tagging_loss=0.008456, over 15098.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08953, pruned_loss=0.0121, audio_tagging_loss=0.008908, over 3047318.72 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:07:46,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3435340.0, ans=0.1 2023-11-28 09:07:51,762 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-11-28 09:08:01,878 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 9.048e+01 9.599e+01 1.061e+02 1.681e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 09:08:11,539 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.44 vs. limit=22.5 2023-11-28 09:08:18,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3435540.0, ans=0.125 2023-11-28 09:08:37,141 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515350 2023-11-28 09:08:40,331 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10350, loss[loss=0.05291, simple_loss=0.06452, pruned_loss=0.01019, audio_tagging_loss=0.01046, over 13514.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.0908, pruned_loss=0.01238, audio_tagging_loss=0.008863, over 3046059.15 frames. ], batch size: 53, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:08:55,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3435740.0, ans=0.1 2023-11-28 09:09:02,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3435806.6666666665, ans=0.125 2023-11-28 09:09:04,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3435806.6666666665, ans=0.125 2023-11-28 09:09:14,842 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.48 vs. limit=12.0 2023-11-28 09:09:21,713 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=15.0 2023-11-28 09:09:25,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3435940.0, ans=0.125 2023-11-28 09:09:33,523 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515400 2023-11-28 09:09:36,952 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10400, loss[loss=0.07444, simple_loss=0.09601, pruned_loss=0.01411, audio_tagging_loss=0.01233, over 15836.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09044, pruned_loss=0.01229, audio_tagging_loss=0.009002, over 3056175.39 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:09:54,531 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.993e+01 9.634e+01 1.025e+02 1.288e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 09:10:03,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3436140.0, ans=0.125 2023-11-28 09:10:23,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3436273.3333333335, ans=0.125 2023-11-28 09:10:24,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3436273.3333333335, ans=0.0 2023-11-28 09:10:30,048 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515450 2023-11-28 09:10:33,227 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10450, loss[loss=0.05722, simple_loss=0.07507, pruned_loss=0.0113, audio_tagging_loss=0.008382, over 16122.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08932, pruned_loss=0.01214, audio_tagging_loss=0.009004, over 3054535.46 frames. ], batch size: 61, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:10:47,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3436406.6666666665, ans=0.0 2023-11-28 09:10:52,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3436406.6666666665, ans=0.5 2023-11-28 09:11:26,960 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515500 2023-11-28 09:11:30,124 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10500, loss[loss=0.05209, simple_loss=0.07065, pruned_loss=0.008268, audio_tagging_loss=0.008498, over 16151.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08851, pruned_loss=0.012, audio_tagging_loss=0.008878, over 3046960.62 frames. ], batch size: 61, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:11:35,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3436673.3333333335, ans=0.2 2023-11-28 09:11:40,410 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:11:48,944 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.855e+01 9.374e+01 1.019e+02 1.300e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-28 09:11:55,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3436806.6666666665, ans=0.125 2023-11-28 09:12:25,157 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515550 2023-11-28 09:12:28,319 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10550, loss[loss=0.05477, simple_loss=0.07377, pruned_loss=0.007328, audio_tagging_loss=0.01056, over 14825.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08794, pruned_loss=0.01187, audio_tagging_loss=0.008814, over 3048929.70 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:12:38,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3437073.3333333335, ans=0.2 2023-11-28 09:13:13,097 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=12.0 2023-11-28 09:13:15,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3437273.3333333335, ans=0.125 2023-11-28 09:13:21,720 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515600 2023-11-28 09:13:23,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3437273.3333333335, ans=0.125 2023-11-28 09:13:25,232 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10600, loss[loss=0.05782, simple_loss=0.078, pruned_loss=0.0109, audio_tagging_loss=0.007927, over 16727.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08781, pruned_loss=0.01195, audio_tagging_loss=0.008712, over 3044649.26 frames. ], batch size: 64, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:13:29,088 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.63 vs. limit=15.0 2023-11-28 09:13:40,047 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.51 vs. limit=10.0 2023-11-28 09:13:42,820 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.765e+01 9.109e+01 9.906e+01 1.072e+02 1.462e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-28 09:14:07,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3437540.0, ans=0.0 2023-11-28 09:14:07,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3437540.0, ans=0.125 2023-11-28 09:14:17,931 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515650 2023-11-28 09:14:21,253 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10650, loss[loss=0.04581, simple_loss=0.05877, pruned_loss=0.007503, audio_tagging_loss=0.00892, over 13661.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08826, pruned_loss=0.01215, audio_tagging_loss=0.008667, over 3043426.51 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:14:21,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3437673.3333333335, ans=0.125 2023-11-28 09:14:28,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3437673.3333333335, ans=0.0 2023-11-28 09:14:28,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3437673.3333333335, ans=0.0 2023-11-28 09:14:34,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3437740.0, ans=0.0 2023-11-28 09:14:39,116 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.08 vs. limit=8.0 2023-11-28 09:14:45,881 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.99 vs. limit=15.0 2023-11-28 09:14:53,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3437806.6666666665, ans=0.125 2023-11-28 09:15:07,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3437940.0, ans=0.125 2023-11-28 09:15:10,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3437940.0, ans=0.125 2023-11-28 09:15:13,642 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515700 2023-11-28 09:15:17,395 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10700, loss[loss=0.05197, simple_loss=0.07315, pruned_loss=0.005843, audio_tagging_loss=0.009553, over 14386.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08904, pruned_loss=0.01224, audio_tagging_loss=0.008651, over 3041898.01 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:15:32,044 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.41 vs. limit=22.5 2023-11-28 09:15:34,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3438073.3333333335, ans=0.125 2023-11-28 09:15:35,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3438073.3333333335, ans=0.125 2023-11-28 09:15:36,689 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.621e+01 8.910e+01 9.467e+01 1.013e+02 1.295e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 09:15:41,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3438140.0, ans=0.0 2023-11-28 09:16:10,843 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515750 2023-11-28 09:16:13,976 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10750, loss[loss=0.06575, simple_loss=0.09011, pruned_loss=0.01305, audio_tagging_loss=0.007642, over 15904.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08883, pruned_loss=0.01218, audio_tagging_loss=0.008671, over 3041645.50 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:16:28,834 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.61 vs. limit=22.5 2023-11-28 09:16:39,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3438473.3333333335, ans=0.5 2023-11-28 09:16:40,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3438473.3333333335, ans=0.0 2023-11-28 09:16:41,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3438473.3333333335, ans=0.125 2023-11-28 09:16:42,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3438473.3333333335, ans=0.125 2023-11-28 09:16:53,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3438540.0, ans=0.2 2023-11-28 09:17:05,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3438606.6666666665, ans=0.5 2023-11-28 09:17:06,588 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515800 2023-11-28 09:17:10,050 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10800, loss[loss=0.05929, simple_loss=0.07466, pruned_loss=0.01081, audio_tagging_loss=0.01115, over 17010.00 frames. ], tot_loss[loss=0.065, simple_loss=0.0884, pruned_loss=0.01209, audio_tagging_loss=0.008718, over 3043175.01 frames. ], batch size: 66, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:17:16,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3438673.3333333335, ans=0.1 2023-11-28 09:17:18,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3438673.3333333335, ans=0.125 2023-11-28 09:17:21,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3438740.0, ans=0.0 2023-11-28 09:17:28,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.36 vs. limit=15.0 2023-11-28 09:17:29,072 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.283e+01 8.659e+01 9.192e+01 9.823e+01 1.353e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-28 09:17:47,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.77 vs. limit=6.0 2023-11-28 09:17:55,466 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2023-11-28 09:18:02,525 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515850 2023-11-28 09:18:06,536 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10850, loss[loss=0.05682, simple_loss=0.07972, pruned_loss=0.009034, audio_tagging_loss=0.00792, over 15540.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08881, pruned_loss=0.01224, audio_tagging_loss=0.008702, over 3046934.51 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:18:07,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3439006.6666666665, ans=0.2 2023-11-28 09:18:08,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3439006.6666666665, ans=0.0 2023-11-28 09:18:16,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3439073.3333333335, ans=0.125 2023-11-28 09:18:59,991 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515900 2023-11-28 09:19:03,240 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10900, loss[loss=0.06982, simple_loss=0.09879, pruned_loss=0.01245, audio_tagging_loss=0.007976, over 15682.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08912, pruned_loss=0.01232, audio_tagging_loss=0.008626, over 3049245.33 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:19:03,255 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:19:04,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3439340.0, ans=0.04949747468305833 2023-11-28 09:19:21,912 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.849e+01 9.090e+01 9.658e+01 1.040e+02 1.317e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 09:19:22,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3439406.6666666665, ans=0.125 2023-11-28 09:19:46,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3439540.0, ans=0.1 2023-11-28 09:19:47,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3439606.6666666665, ans=0.1 2023-11-28 09:19:56,332 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515950 2023-11-28 09:19:59,471 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10950, loss[loss=0.09038, simple_loss=0.122, pruned_loss=0.0212, audio_tagging_loss=0.008184, over 15174.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08911, pruned_loss=0.01227, audio_tagging_loss=0.008738, over 3049198.19 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:20:04,165 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2023-11-28 09:20:13,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3439740.0, ans=0.05 2023-11-28 09:20:17,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3439740.0, ans=0.125 2023-11-28 09:20:26,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3439806.6666666665, ans=0.125 2023-11-28 09:20:34,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3439873.3333333335, ans=0.1 2023-11-28 09:20:36,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3439873.3333333335, ans=0.0 2023-11-28 09:20:51,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3439940.0, ans=0.125 2023-11-28 09:20:52,063 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516000 2023-11-28 09:20:53,434 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-516000.pt 2023-11-28 09:20:57,607 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11000, loss[loss=0.07627, simple_loss=0.1013, pruned_loss=0.01766, audio_tagging_loss=0.007992, over 15788.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08876, pruned_loss=0.01225, audio_tagging_loss=0.008894, over 3052162.33 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:21:10,790 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:21:17,808 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.160e+01 8.606e+01 9.397e+01 9.983e+01 1.237e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-28 09:21:30,544 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.17 vs. limit=15.0 2023-11-28 09:21:38,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3440206.6666666665, ans=0.125 2023-11-28 09:21:44,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3440273.3333333335, ans=0.1 2023-11-28 09:21:51,204 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516050 2023-11-28 09:21:54,932 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11050, loss[loss=0.06925, simple_loss=0.101, pruned_loss=0.01137, audio_tagging_loss=0.00736, over 16048.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08982, pruned_loss=0.01264, audio_tagging_loss=0.008872, over 3049234.98 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:22:35,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3440540.0, ans=0.0 2023-11-28 09:22:48,694 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516100 2023-11-28 09:22:52,004 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11100, loss[loss=0.05849, simple_loss=0.07505, pruned_loss=0.01047, audio_tagging_loss=0.01049, over 15265.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09037, pruned_loss=0.01272, audio_tagging_loss=0.008992, over 3047448.50 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:22:54,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3440673.3333333335, ans=0.2 2023-11-28 09:22:56,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3440673.3333333335, ans=0.2 2023-11-28 09:23:03,683 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=15.0 2023-11-28 09:23:10,625 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.60 vs. limit=22.5 2023-11-28 09:23:12,317 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 8.852e+01 9.435e+01 1.052e+02 1.493e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 09:23:17,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3440806.6666666665, ans=0.0 2023-11-28 09:23:19,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3440806.6666666665, ans=0.1 2023-11-28 09:23:33,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3440873.3333333335, ans=0.125 2023-11-28 09:23:34,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3440873.3333333335, ans=0.0 2023-11-28 09:23:39,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3440940.0, ans=0.125 2023-11-28 09:23:43,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3440940.0, ans=15.0 2023-11-28 09:23:43,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.57 vs. limit=15.0 2023-11-28 09:23:45,839 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516150 2023-11-28 09:23:49,020 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11150, loss[loss=0.07095, simple_loss=0.1006, pruned_loss=0.01023, audio_tagging_loss=0.0104, over 14421.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09065, pruned_loss=0.01268, audio_tagging_loss=0.009035, over 3057516.10 frames. ], batch size: 54, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:24:07,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3441073.3333333335, ans=0.125 2023-11-28 09:24:08,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3441073.3333333335, ans=0.2 2023-11-28 09:24:17,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3441140.0, ans=0.125 2023-11-28 09:24:28,548 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:24:30,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3441206.6666666665, ans=0.125 2023-11-28 09:24:43,305 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516200 2023-11-28 09:24:47,399 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11200, loss[loss=0.04791, simple_loss=0.06812, pruned_loss=0.005127, audio_tagging_loss=0.008722, over 16333.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09003, pruned_loss=0.01254, audio_tagging_loss=0.009136, over 3054059.76 frames. ], batch size: 63, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:24:52,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.81 vs. limit=15.0 2023-11-28 09:25:01,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3441406.6666666665, ans=0.0 2023-11-28 09:25:07,967 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.684e+01 9.493e+01 1.049e+02 1.376e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 09:25:19,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3441473.3333333335, ans=0.025 2023-11-28 09:25:20,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3441540.0, ans=0.1 2023-11-28 09:25:27,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3441540.0, ans=0.0 2023-11-28 09:25:29,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3441540.0, ans=0.1 2023-11-28 09:25:38,591 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=15.0 2023-11-28 09:25:41,294 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516250 2023-11-28 09:25:44,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3441673.3333333335, ans=0.035 2023-11-28 09:25:45,014 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11250, loss[loss=0.05925, simple_loss=0.0834, pruned_loss=0.00816, audio_tagging_loss=0.009389, over 15673.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08937, pruned_loss=0.01238, audio_tagging_loss=0.009116, over 3048916.66 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:25:47,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3441673.3333333335, ans=0.05 2023-11-28 09:26:06,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3441806.6666666665, ans=0.0 2023-11-28 09:26:08,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3441806.6666666665, ans=0.0 2023-11-28 09:26:33,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3441940.0, ans=0.125 2023-11-28 09:26:33,923 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.35 vs. limit=15.0 2023-11-28 09:26:38,718 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516300 2023-11-28 09:26:41,908 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11300, loss[loss=0.06625, simple_loss=0.08952, pruned_loss=0.01233, audio_tagging_loss=0.009157, over 16334.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08904, pruned_loss=0.01224, audio_tagging_loss=0.008949, over 3049452.07 frames. ], batch size: 63, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:26:53,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3442073.3333333335, ans=0.1 2023-11-28 09:27:00,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3442073.3333333335, ans=0.125 2023-11-28 09:27:02,754 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.774e+01 8.901e+01 9.622e+01 1.003e+02 2.071e+02, threshold=1.924e+02, percent-clipped=1.0 2023-11-28 09:27:15,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3442206.6666666665, ans=0.125 2023-11-28 09:27:17,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3442206.6666666665, ans=0.125 2023-11-28 09:27:35,489 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516350 2023-11-28 09:27:38,733 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11350, loss[loss=0.06128, simple_loss=0.07963, pruned_loss=0.01091, audio_tagging_loss=0.01055, over 15327.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08941, pruned_loss=0.01227, audio_tagging_loss=0.008797, over 3040852.51 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:27:38,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3442340.0, ans=0.2 2023-11-28 09:27:55,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3442406.6666666665, ans=0.125 2023-11-28 09:28:21,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3442540.0, ans=0.0 2023-11-28 09:28:29,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3442606.6666666665, ans=0.0 2023-11-28 09:28:30,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3442606.6666666665, ans=0.2 2023-11-28 09:28:32,955 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516400 2023-11-28 09:28:36,502 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11400, loss[loss=0.07943, simple_loss=0.106, pruned_loss=0.02024, audio_tagging_loss=0.006177, over 14373.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09114, pruned_loss=0.01271, audio_tagging_loss=0.008662, over 3039146.11 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:28:56,366 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.771e+01 9.196e+01 9.896e+01 1.286e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-28 09:29:08,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3442806.6666666665, ans=10.0 2023-11-28 09:29:12,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3442873.3333333335, ans=22.5 2023-11-28 09:29:17,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3442873.3333333335, ans=0.0 2023-11-28 09:29:30,204 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516450 2023-11-28 09:29:33,424 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11450, loss[loss=0.05894, simple_loss=0.08247, pruned_loss=0.01184, audio_tagging_loss=0.005866, over 14696.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09058, pruned_loss=0.01254, audio_tagging_loss=0.008699, over 3031858.27 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:29:44,367 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2023-11-28 09:30:05,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3443140.0, ans=0.1 2023-11-28 09:30:12,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3443206.6666666665, ans=0.2 2023-11-28 09:30:25,562 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.01 vs. limit=15.0 2023-11-28 09:30:27,786 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516500 2023-11-28 09:30:30,961 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11500, loss[loss=0.06819, simple_loss=0.09495, pruned_loss=0.01465, audio_tagging_loss=0.006064, over 14543.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.0907, pruned_loss=0.01256, audio_tagging_loss=0.00868, over 3037934.61 frames. ], batch size: 53, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:30:31,514 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.12 vs. limit=22.5 2023-11-28 09:30:36,046 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:30:44,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3443406.6666666665, ans=0.125 2023-11-28 09:30:52,629 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.962e+01 8.608e+01 9.367e+01 9.940e+01 1.192e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 09:30:53,957 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:31:06,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3443540.0, ans=0.1 2023-11-28 09:31:13,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3443540.0, ans=0.125 2023-11-28 09:31:19,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3443606.6666666665, ans=0.1 2023-11-28 09:31:25,518 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516550 2023-11-28 09:31:26,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3443606.6666666665, ans=0.1 2023-11-28 09:31:28,717 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11550, loss[loss=0.06561, simple_loss=0.09299, pruned_loss=0.01129, audio_tagging_loss=0.007829, over 16275.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09066, pruned_loss=0.01255, audio_tagging_loss=0.008754, over 3047594.68 frames. ], batch size: 61, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:31:41,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=15.0 2023-11-28 09:32:06,698 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:32:12,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3443873.3333333335, ans=0.09899494936611666 2023-11-28 09:32:19,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3443940.0, ans=0.0 2023-11-28 09:32:21,771 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516600 2023-11-28 09:32:25,234 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11600, loss[loss=0.07053, simple_loss=0.08768, pruned_loss=0.01514, audio_tagging_loss=0.01156, over 15262.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09156, pruned_loss=0.01267, audio_tagging_loss=0.008655, over 3046012.61 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:32:28,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3444006.6666666665, ans=0.1 2023-11-28 09:32:47,366 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.769e+01 9.333e+01 1.033e+02 1.788e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-28 09:32:57,073 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.30 vs. limit=15.0 2023-11-28 09:32:58,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3444206.6666666665, ans=0.0 2023-11-28 09:33:03,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3444206.6666666665, ans=0.1 2023-11-28 09:33:18,720 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516650 2023-11-28 09:33:22,562 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11650, loss[loss=0.0776, simple_loss=0.1017, pruned_loss=0.018, audio_tagging_loss=0.008728, over 15371.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09081, pruned_loss=0.01265, audio_tagging_loss=0.008746, over 3039409.38 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:33:32,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3444340.0, ans=0.125 2023-11-28 09:33:32,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3444340.0, ans=0.125 2023-11-28 09:33:51,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3444473.3333333335, ans=0.2 2023-11-28 09:33:59,491 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.91 vs. limit=15.0 2023-11-28 09:34:09,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3444606.6666666665, ans=0.2 2023-11-28 09:34:13,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3444606.6666666665, ans=0.0 2023-11-28 09:34:17,089 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516700 2023-11-28 09:34:20,379 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11700, loss[loss=0.05414, simple_loss=0.07522, pruned_loss=0.007836, audio_tagging_loss=0.008697, over 14770.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09007, pruned_loss=0.01244, audio_tagging_loss=0.008752, over 3042695.08 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:34:42,355 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.763e+01 9.224e+01 1.034e+02 1.340e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-28 09:35:13,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3444940.0, ans=0.2 2023-11-28 09:35:13,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.46 vs. limit=10.0 2023-11-28 09:35:14,265 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516750 2023-11-28 09:35:17,411 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11750, loss[loss=0.05948, simple_loss=0.07369, pruned_loss=0.01221, audio_tagging_loss=0.01043, over 15395.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09117, pruned_loss=0.0125, audio_tagging_loss=0.008728, over 3050282.44 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:35:26,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3445006.6666666665, ans=0.125 2023-11-28 09:35:31,467 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2023-11-28 09:35:38,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3445140.0, ans=0.0 2023-11-28 09:35:40,964 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.60 vs. limit=15.0 2023-11-28 09:36:09,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3445273.3333333335, ans=0.125 2023-11-28 09:36:10,203 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516800 2023-11-28 09:36:14,213 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11800, loss[loss=0.06575, simple_loss=0.0856, pruned_loss=0.01367, audio_tagging_loss=0.00928, over 15368.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09062, pruned_loss=0.01262, audio_tagging_loss=0.008794, over 3035136.07 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:36:16,447 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2023-11-28 09:36:17,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3445340.0, ans=0.1 2023-11-28 09:36:19,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2023-11-28 09:36:26,658 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:36:28,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3445406.6666666665, ans=0.125 2023-11-28 09:36:32,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3445406.6666666665, ans=0.2 2023-11-28 09:36:33,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3445406.6666666665, ans=0.125 2023-11-28 09:36:36,722 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.69 vs. limit=12.0 2023-11-28 09:36:37,266 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.864e+01 9.665e+01 1.018e+02 1.283e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-28 09:37:08,593 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516850 2023-11-28 09:37:12,341 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11850, loss[loss=0.06443, simple_loss=0.08225, pruned_loss=0.01407, audio_tagging_loss=0.009234, over 14756.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09096, pruned_loss=0.0126, audio_tagging_loss=0.008858, over 3035582.24 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:37:23,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3445740.0, ans=0.2 2023-11-28 09:37:25,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3445740.0, ans=0.1 2023-11-28 09:37:53,462 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.15 vs. limit=15.0 2023-11-28 09:37:54,509 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2023-11-28 09:38:06,157 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516900 2023-11-28 09:38:09,350 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11900, loss[loss=0.05134, simple_loss=0.07081, pruned_loss=0.006275, audio_tagging_loss=0.009665, over 15513.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09015, pruned_loss=0.0124, audio_tagging_loss=0.008911, over 3037576.91 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:38:21,763 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:38:32,386 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.420e+01 8.705e+01 9.389e+01 1.010e+02 1.284e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 09:38:35,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3446140.0, ans=0.125 2023-11-28 09:38:45,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3446206.6666666665, ans=0.125 2023-11-28 09:38:47,117 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.28 vs. limit=22.5 2023-11-28 09:38:48,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3446206.6666666665, ans=0.125 2023-11-28 09:38:54,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3446273.3333333335, ans=0.0 2023-11-28 09:39:02,992 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516950 2023-11-28 09:39:03,531 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2023-11-28 09:39:06,130 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11950, loss[loss=0.05613, simple_loss=0.07482, pruned_loss=0.007404, audio_tagging_loss=0.01131, over 14745.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09068, pruned_loss=0.01255, audio_tagging_loss=0.00887, over 3041958.21 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:39:20,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3446406.6666666665, ans=0.125 2023-11-28 09:39:20,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.12 vs. limit=22.5 2023-11-28 09:39:30,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3446473.3333333335, ans=0.125 2023-11-28 09:39:30,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3446473.3333333335, ans=0.0 2023-11-28 09:39:38,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3446473.3333333335, ans=0.125 2023-11-28 09:39:39,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3446473.3333333335, ans=0.125 2023-11-28 09:39:46,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3446540.0, ans=0.125 2023-11-28 09:39:57,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3446606.6666666665, ans=0.2 2023-11-28 09:39:58,674 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517000 2023-11-28 09:40:01,989 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 12000, loss[loss=0.06078, simple_loss=0.08589, pruned_loss=0.01056, audio_tagging_loss=0.007265, over 14733.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09104, pruned_loss=0.01269, audio_tagging_loss=0.008871, over 3042896.24 frames. ], batch size: 54, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:40:01,990 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 09:40:30,473 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.0121, 2.9756, 2.7253, 2.6870, 3.3212, 3.3514, 3.1419, 3.6544], device='cuda:0') 2023-11-28 09:40:36,963 INFO [train_asr.py:1267] (0/4) Epoch 43, validation: loss=0.05826, simple_loss=0.05053, pruned_loss=0.005231, audio_tagging_loss=0.02777, over 4681554.00 frames. 2023-11-28 09:40:36,963 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 09:40:38,472 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.37 vs. limit=15.0 2023-11-28 09:40:57,624 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.357e+01 8.981e+01 9.596e+01 1.044e+02 1.233e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-28 09:41:03,824 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-43.pt 2023-11-28 09:41:18,033 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 0, loss[loss=0.08192, simple_loss=0.09662, pruned_loss=0.01235, audio_tagging_loss=0.02127, over 15145.00 frames. ], tot_loss[loss=0.08192, simple_loss=0.09662, pruned_loss=0.01235, audio_tagging_loss=0.02127, over 15145.00 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:41:18,035 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 09:41:43,729 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.9999, 5.8837, 5.6827, 5.5799], device='cuda:0') 2023-11-28 09:41:48,743 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7999, 5.8398, 5.8939, 5.8755], device='cuda:0') 2023-11-28 09:41:52,343 INFO [train_asr.py:1267] (0/4) Epoch 44, validation: loss=0.05791, simple_loss=0.05054, pruned_loss=0.00521, audio_tagging_loss=0.02743, over 4681554.00 frames. 2023-11-28 09:41:52,344 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 09:41:58,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3446840.0, ans=0.0 2023-11-28 09:42:12,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3446906.6666666665, ans=0.125 2023-11-28 09:42:18,944 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517050 2023-11-28 09:42:33,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3447040.0, ans=0.125 2023-11-28 09:42:43,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3447106.6666666665, ans=0.0 2023-11-28 09:42:44,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3447106.6666666665, ans=0.1 2023-11-28 09:42:50,857 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 50, loss[loss=0.0728, simple_loss=0.08976, pruned_loss=0.0138, audio_tagging_loss=0.01413, over 14502.00 frames. ], tot_loss[loss=0.07636, simple_loss=0.09362, pruned_loss=0.0133, audio_tagging_loss=0.01625, over 686110.90 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:42:51,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3447173.3333333335, ans=0.125 2023-11-28 09:43:03,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3447240.0, ans=0.2 2023-11-28 09:43:16,879 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517100 2023-11-28 09:43:18,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3447306.6666666665, ans=0.1 2023-11-28 09:43:44,303 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.529e+01 9.824e+01 1.052e+02 1.128e+02 1.642e+02, threshold=2.105e+02, percent-clipped=0.0 2023-11-28 09:43:50,422 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 100, loss[loss=0.04816, simple_loss=0.05162, pruned_loss=0.006625, audio_tagging_loss=0.01572, over 14981.00 frames. ], tot_loss[loss=0.07327, simple_loss=0.09059, pruned_loss=0.01222, audio_tagging_loss=0.01576, over 1212282.56 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:43:52,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3447506.6666666665, ans=0.125 2023-11-28 09:43:53,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3447506.6666666665, ans=0.1 2023-11-28 09:44:09,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3447573.3333333335, ans=0.0 2023-11-28 09:44:12,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3447640.0, ans=0.125 2023-11-28 09:44:15,361 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517150 2023-11-28 09:44:19,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3447640.0, ans=0.125 2023-11-28 09:44:47,934 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 150, loss[loss=0.06327, simple_loss=0.08813, pruned_loss=0.009639, audio_tagging_loss=0.009564, over 14889.00 frames. ], tot_loss[loss=0.071, simple_loss=0.08953, pruned_loss=0.01202, audio_tagging_loss=0.01421, over 1623112.61 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:45:01,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3447906.6666666665, ans=0.1 2023-11-28 09:45:01,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3447906.6666666665, ans=0.1 2023-11-28 09:45:01,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3447906.6666666665, ans=0.125 2023-11-28 09:45:14,020 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517200 2023-11-28 09:45:34,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3448106.6666666665, ans=0.0 2023-11-28 09:45:41,869 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 9.000e+01 9.478e+01 1.042e+02 1.328e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 09:45:46,282 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 200, loss[loss=0.05014, simple_loss=0.06233, pruned_loss=0.007397, audio_tagging_loss=0.01158, over 15136.00 frames. ], tot_loss[loss=0.07007, simple_loss=0.09045, pruned_loss=0.01229, audio_tagging_loss=0.01256, over 1935519.47 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:45:52,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3448173.3333333335, ans=0.125 2023-11-28 09:46:06,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3448240.0, ans=0.0 2023-11-28 09:46:11,914 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517250 2023-11-28 09:46:17,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3448306.6666666665, ans=0.125 2023-11-28 09:46:43,885 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 250, loss[loss=0.07031, simple_loss=0.09847, pruned_loss=0.01351, audio_tagging_loss=0.007564, over 15422.00 frames. ], tot_loss[loss=0.06875, simple_loss=0.09021, pruned_loss=0.01217, audio_tagging_loss=0.01147, over 2181376.47 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:47:01,447 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.84 vs. limit=15.0 2023-11-28 09:47:02,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.12 vs. limit=15.0 2023-11-28 09:47:08,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3448640.0, ans=0.2 2023-11-28 09:47:09,191 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517300 2023-11-28 09:47:36,520 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.779e+01 9.287e+01 9.816e+01 1.058e+02 1.436e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-28 09:47:37,619 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2023-11-28 09:47:41,513 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 300, loss[loss=0.05764, simple_loss=0.07068, pruned_loss=0.01143, audio_tagging_loss=0.01088, over 15270.00 frames. ], tot_loss[loss=0.06897, simple_loss=0.0914, pruned_loss=0.0127, audio_tagging_loss=0.01057, over 2378216.98 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:47:41,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3448840.0, ans=0.1 2023-11-28 09:47:43,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3448840.0, ans=0.1 2023-11-28 09:47:47,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3448840.0, ans=0.125 2023-11-28 09:47:55,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3448906.6666666665, ans=0.125 2023-11-28 09:48:07,245 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517350 2023-11-28 09:48:27,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3449106.6666666665, ans=0.1 2023-11-28 09:48:39,228 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 350, loss[loss=0.07098, simple_loss=0.09474, pruned_loss=0.01552, audio_tagging_loss=0.008099, over 15447.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09076, pruned_loss=0.0125, audio_tagging_loss=0.009978, over 2531975.24 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:48:39,731 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.29 vs. limit=15.0 2023-11-28 09:48:40,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3449173.3333333335, ans=0.0 2023-11-28 09:48:42,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3449173.3333333335, ans=0.125 2023-11-28 09:48:46,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3449173.3333333335, ans=0.1 2023-11-28 09:48:49,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3449240.0, ans=0.125 2023-11-28 09:48:55,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3449240.0, ans=0.125 2023-11-28 09:49:01,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3449306.6666666665, ans=0.0 2023-11-28 09:49:04,253 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517400 2023-11-28 09:49:11,170 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.39 vs. limit=15.0 2023-11-28 09:49:11,444 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.24 vs. limit=15.0 2023-11-28 09:49:16,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3449373.3333333335, ans=0.2 2023-11-28 09:49:21,629 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2023-11-28 09:49:23,861 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.07 vs. limit=22.5 2023-11-28 09:49:24,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3449440.0, ans=0.125 2023-11-28 09:49:26,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3449440.0, ans=0.125 2023-11-28 09:49:32,631 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.240e+01 9.082e+01 9.709e+01 1.033e+02 1.269e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 09:49:35,780 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=22.5 2023-11-28 09:49:37,631 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 400, loss[loss=0.05078, simple_loss=0.06196, pruned_loss=0.008109, audio_tagging_loss=0.01169, over 14289.00 frames. ], tot_loss[loss=0.06783, simple_loss=0.091, pruned_loss=0.01258, audio_tagging_loss=0.00976, over 2649092.55 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:49:44,547 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:49:52,214 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.77 vs. limit=12.0 2023-11-28 09:50:00,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3449640.0, ans=0.125 2023-11-28 09:50:03,327 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517450 2023-11-28 09:50:24,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3449773.3333333335, ans=0.1 2023-11-28 09:50:26,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3449773.3333333335, ans=0.07 2023-11-28 09:50:34,879 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 450, loss[loss=0.05509, simple_loss=0.07856, pruned_loss=0.009951, audio_tagging_loss=0.005855, over 15310.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09047, pruned_loss=0.01237, audio_tagging_loss=0.009404, over 2735444.03 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:50:35,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3449840.0, ans=0.0 2023-11-28 09:50:47,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=3449906.6666666665, ans=12.0 2023-11-28 09:51:00,743 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517500 2023-11-28 09:51:03,593 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.15 vs. limit=22.5 2023-11-28 09:51:28,861 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 8.576e+01 9.362e+01 1.011e+02 1.317e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 09:51:32,721 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 500, loss[loss=0.06596, simple_loss=0.08679, pruned_loss=0.01436, audio_tagging_loss=0.008206, over 15106.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09064, pruned_loss=0.01245, audio_tagging_loss=0.009206, over 2806718.37 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:51:33,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3450173.3333333335, ans=0.1 2023-11-28 09:51:35,464 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.00 vs. limit=15.0 2023-11-28 09:51:50,026 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2023-11-28 09:51:51,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3450240.0, ans=0.025 2023-11-28 09:51:58,191 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517550 2023-11-28 09:52:08,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3450373.3333333335, ans=0.2 2023-11-28 09:52:10,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3450373.3333333335, ans=0.0 2023-11-28 09:52:11,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3450373.3333333335, ans=0.125 2023-11-28 09:52:30,027 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 550, loss[loss=0.07974, simple_loss=0.105, pruned_loss=0.01755, audio_tagging_loss=0.009677, over 15086.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09032, pruned_loss=0.01247, audio_tagging_loss=0.009069, over 2848657.15 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:52:55,424 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517600 2023-11-28 09:53:24,171 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.868e+01 9.461e+01 1.003e+02 1.214e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-28 09:53:27,019 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2023-11-28 09:53:27,496 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 600, loss[loss=0.07243, simple_loss=0.1055, pruned_loss=0.01293, audio_tagging_loss=0.006768, over 14552.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08926, pruned_loss=0.01233, audio_tagging_loss=0.009093, over 2894518.76 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:53:32,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3450840.0, ans=0.125 2023-11-28 09:53:42,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3450906.6666666665, ans=0.125 2023-11-28 09:53:43,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3450906.6666666665, ans=0.125 2023-11-28 09:53:45,708 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2023-11-28 09:53:53,159 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517650 2023-11-28 09:54:03,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3451040.0, ans=0.0 2023-11-28 09:54:20,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3451106.6666666665, ans=0.05 2023-11-28 09:54:21,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3451106.6666666665, ans=0.125 2023-11-28 09:54:25,027 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 650, loss[loss=0.07089, simple_loss=0.09668, pruned_loss=0.01398, audio_tagging_loss=0.008565, over 13880.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08945, pruned_loss=0.01231, audio_tagging_loss=0.008928, over 2929660.48 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:54:38,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3451240.0, ans=0.2 2023-11-28 09:54:42,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3451240.0, ans=0.125 2023-11-28 09:54:50,116 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517700 2023-11-28 09:54:55,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3451306.6666666665, ans=0.2 2023-11-28 09:55:13,321 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2023-11-28 09:55:18,038 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 9.000e+01 9.495e+01 1.012e+02 1.235e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 09:55:21,740 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 700, loss[loss=0.05618, simple_loss=0.06978, pruned_loss=0.0112, audio_tagging_loss=0.01009, over 15203.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08891, pruned_loss=0.0122, audio_tagging_loss=0.008931, over 2960801.48 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:55:28,455 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.37 vs. limit=22.5 2023-11-28 09:55:33,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3451573.3333333335, ans=0.1 2023-11-28 09:55:46,366 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517750 2023-11-28 09:56:06,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3451773.3333333335, ans=0.07 2023-11-28 09:56:18,697 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 750, loss[loss=0.05725, simple_loss=0.07173, pruned_loss=0.01032, audio_tagging_loss=0.01107, over 16674.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08944, pruned_loss=0.01226, audio_tagging_loss=0.00893, over 2988810.19 frames. ], batch size: 64, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:56:21,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3451840.0, ans=0.2 2023-11-28 09:56:23,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3451840.0, ans=0.125 2023-11-28 09:56:35,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3451906.6666666665, ans=0.0 2023-11-28 09:56:36,967 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.35 vs. limit=12.0 2023-11-28 09:56:41,436 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.67 vs. limit=10.0 2023-11-28 09:56:44,400 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517800 2023-11-28 09:57:13,227 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 8.892e+01 9.576e+01 1.074e+02 1.448e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 09:57:14,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3452106.6666666665, ans=0.0 2023-11-28 09:57:15,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3452173.3333333335, ans=0.0 2023-11-28 09:57:16,371 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 800, loss[loss=0.07062, simple_loss=0.1036, pruned_loss=0.01118, audio_tagging_loss=0.007643, over 14956.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08941, pruned_loss=0.0121, audio_tagging_loss=0.008937, over 3001736.22 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:57:42,658 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517850 2023-11-28 09:58:09,520 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.22 vs. limit=15.0 2023-11-28 09:58:14,588 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 850, loss[loss=0.08393, simple_loss=0.1278, pruned_loss=0.01507, audio_tagging_loss=0.004936, over 15228.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.0909, pruned_loss=0.01234, audio_tagging_loss=0.008988, over 3016814.81 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:58:17,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3452506.6666666665, ans=0.125 2023-11-28 09:58:31,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3452573.3333333335, ans=0.0 2023-11-28 09:58:39,972 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517900 2023-11-28 09:59:10,924 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.934e+01 9.404e+01 1.018e+02 1.329e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-28 09:59:13,130 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 900, loss[loss=0.07065, simple_loss=0.1037, pruned_loss=0.01081, audio_tagging_loss=0.008017, over 14727.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09126, pruned_loss=0.01224, audio_tagging_loss=0.009039, over 3026766.85 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:59:31,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3452906.6666666665, ans=0.0 2023-11-28 09:59:37,873 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517950 2023-11-28 09:59:40,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3452973.3333333335, ans=0.125 2023-11-28 09:59:56,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3453040.0, ans=10.0 2023-11-28 09:59:58,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3453106.6666666665, ans=0.1 2023-11-28 10:00:09,550 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 950, loss[loss=0.0621, simple_loss=0.07781, pruned_loss=0.01302, audio_tagging_loss=0.01017, over 14691.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.0903, pruned_loss=0.01219, audio_tagging_loss=0.008976, over 3030042.14 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:00:17,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3453173.3333333335, ans=0.1 2023-11-28 10:00:18,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3453173.3333333335, ans=0.125 2023-11-28 10:00:23,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3453240.0, ans=0.125 2023-11-28 10:00:35,427 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518000 2023-11-28 10:00:46,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3453373.3333333335, ans=0.125 2023-11-28 10:00:46,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3453373.3333333335, ans=0.125 2023-11-28 10:00:55,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3453440.0, ans=0.0 2023-11-28 10:01:00,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3453440.0, ans=0.125 2023-11-28 10:01:04,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3453440.0, ans=15.0 2023-11-28 10:01:05,936 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.727e+01 8.698e+01 9.447e+01 1.001e+02 1.435e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 10:01:07,030 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1000, loss[loss=0.07572, simple_loss=0.1093, pruned_loss=0.01249, audio_tagging_loss=0.008557, over 15937.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.0904, pruned_loss=0.01234, audio_tagging_loss=0.008784, over 3034345.40 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:01:25,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3453573.3333333335, ans=0.09899494936611666 2023-11-28 10:01:32,576 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518050 2023-11-28 10:01:33,688 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:01:36,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.44 vs. limit=10.0 2023-11-28 10:01:38,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3453640.0, ans=0.2 2023-11-28 10:01:57,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3453773.3333333335, ans=0.1 2023-11-28 10:02:05,481 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1050, loss[loss=0.06149, simple_loss=0.0802, pruned_loss=0.01436, audio_tagging_loss=0.007037, over 16035.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08964, pruned_loss=0.01231, audio_tagging_loss=0.00863, over 3025806.92 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:02:10,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3453840.0, ans=0.0 2023-11-28 10:02:25,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3453906.6666666665, ans=0.0 2023-11-28 10:02:29,129 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.83 vs. limit=22.5 2023-11-28 10:02:30,875 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518100 2023-11-28 10:02:33,622 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2023-11-28 10:02:52,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3454106.6666666665, ans=0.025 2023-11-28 10:02:54,372 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=22.5 2023-11-28 10:03:01,559 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.979e+01 9.409e+01 9.986e+01 1.298e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 10:03:02,658 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1100, loss[loss=0.05258, simple_loss=0.06578, pruned_loss=0.009471, audio_tagging_loss=0.01021, over 14127.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08951, pruned_loss=0.01232, audio_tagging_loss=0.00858, over 3025632.27 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:03:08,619 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:03:16,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3454240.0, ans=0.2 2023-11-28 10:03:28,438 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518150 2023-11-28 10:03:36,130 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:03:44,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3454373.3333333335, ans=0.125 2023-11-28 10:03:55,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3454440.0, ans=0.0 2023-11-28 10:03:59,639 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1150, loss[loss=0.06316, simple_loss=0.08246, pruned_loss=0.01352, audio_tagging_loss=0.008411, over 14675.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08832, pruned_loss=0.01219, audio_tagging_loss=0.008662, over 3030988.01 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:04:13,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3454573.3333333335, ans=0.0 2023-11-28 10:04:24,472 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=12.0 2023-11-28 10:04:24,957 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518200 2023-11-28 10:04:27,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3454640.0, ans=0.0 2023-11-28 10:04:40,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3454706.6666666665, ans=0.125 2023-11-28 10:04:44,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3454773.3333333335, ans=0.125 2023-11-28 10:04:57,108 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.756e+01 8.839e+01 9.353e+01 1.036e+02 1.275e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-28 10:04:57,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3454840.0, ans=0.1 2023-11-28 10:04:58,220 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1200, loss[loss=0.05045, simple_loss=0.07278, pruned_loss=0.00508, audio_tagging_loss=0.008979, over 15904.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08856, pruned_loss=0.0121, audio_tagging_loss=0.008626, over 3038597.14 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:05:20,905 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.18 vs. limit=10.0 2023-11-28 10:05:22,760 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518250 2023-11-28 10:05:26,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3454973.3333333335, ans=0.125 2023-11-28 10:05:54,741 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1250, loss[loss=0.0585, simple_loss=0.08283, pruned_loss=0.0106, audio_tagging_loss=0.00648, over 16218.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08893, pruned_loss=0.0122, audio_tagging_loss=0.008594, over 3041855.05 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:05:57,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3455173.3333333335, ans=0.125 2023-11-28 10:06:00,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3455173.3333333335, ans=0.125 2023-11-28 10:06:15,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2023-11-28 10:06:20,703 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518300 2023-11-28 10:06:25,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3455306.6666666665, ans=0.125 2023-11-28 10:06:50,829 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.624e+01 8.649e+01 9.225e+01 9.865e+01 1.174e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-28 10:06:51,956 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1300, loss[loss=0.05258, simple_loss=0.06767, pruned_loss=0.008733, audio_tagging_loss=0.01001, over 14156.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08981, pruned_loss=0.01236, audio_tagging_loss=0.008591, over 3047118.61 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:06:52,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3455506.6666666665, ans=0.2 2023-11-28 10:07:04,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3455573.3333333335, ans=0.1 2023-11-28 10:07:07,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3455573.3333333335, ans=0.125 2023-11-28 10:07:08,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3455573.3333333335, ans=0.125 2023-11-28 10:07:12,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3455573.3333333335, ans=0.2 2023-11-28 10:07:17,149 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518350 2023-11-28 10:07:32,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3455706.6666666665, ans=0.2 2023-11-28 10:07:34,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3455706.6666666665, ans=0.0 2023-11-28 10:07:38,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3455773.3333333335, ans=0.125 2023-11-28 10:07:49,281 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1350, loss[loss=0.07145, simple_loss=0.1047, pruned_loss=0.0143, audio_tagging_loss=0.004818, over 14859.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08991, pruned_loss=0.01235, audio_tagging_loss=0.008614, over 3043796.52 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:08:14,041 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518400 2023-11-28 10:08:25,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3456040.0, ans=0.5 2023-11-28 10:08:33,608 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:08:36,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3456106.6666666665, ans=0.0 2023-11-28 10:08:37,409 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=22.5 2023-11-28 10:08:45,069 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.591e+01 9.504e+01 1.020e+02 1.211e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 10:08:46,234 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1400, loss[loss=0.05348, simple_loss=0.06768, pruned_loss=0.008013, audio_tagging_loss=0.01163, over 15811.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08979, pruned_loss=0.01228, audio_tagging_loss=0.008654, over 3038654.40 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:08:47,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3456173.3333333335, ans=0.1 2023-11-28 10:08:55,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3456173.3333333335, ans=10.0 2023-11-28 10:08:57,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3456240.0, ans=0.1 2023-11-28 10:09:09,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3456306.6666666665, ans=0.09899494936611666 2023-11-28 10:09:11,807 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518450 2023-11-28 10:09:22,379 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.60 vs. limit=22.5 2023-11-28 10:09:23,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3456373.3333333335, ans=0.125 2023-11-28 10:09:29,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3456373.3333333335, ans=0.0 2023-11-28 10:09:35,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3456440.0, ans=0.125 2023-11-28 10:09:41,464 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:09:43,529 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1450, loss[loss=0.07524, simple_loss=0.1029, pruned_loss=0.01605, audio_tagging_loss=0.00772, over 14276.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09001, pruned_loss=0.01239, audio_tagging_loss=0.008823, over 3032205.04 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:09:44,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3456506.6666666665, ans=0.125 2023-11-28 10:09:58,377 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=22.5 2023-11-28 10:10:07,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3456640.0, ans=0.1 2023-11-28 10:10:08,630 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518500 2023-11-28 10:10:11,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3456640.0, ans=0.2 2023-11-28 10:10:17,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3456706.6666666665, ans=10.0 2023-11-28 10:10:27,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3456706.6666666665, ans=0.0 2023-11-28 10:10:39,656 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.920e+01 9.408e+01 1.027e+02 1.400e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 10:10:41,240 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1500, loss[loss=0.07278, simple_loss=0.1, pruned_loss=0.01175, audio_tagging_loss=0.01102, over 14620.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09051, pruned_loss=0.01225, audio_tagging_loss=0.008828, over 3030585.26 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:10:51,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3456906.6666666665, ans=0.125 2023-11-28 10:10:56,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3456906.6666666665, ans=0.125 2023-11-28 10:11:02,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3456973.3333333335, ans=0.07 2023-11-28 10:11:06,401 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518550 2023-11-28 10:11:06,968 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.37 vs. limit=22.5 2023-11-28 10:11:19,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2023-11-28 10:11:26,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3457106.6666666665, ans=0.125 2023-11-28 10:11:30,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3457106.6666666665, ans=0.125 2023-11-28 10:11:37,956 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1550, loss[loss=0.05504, simple_loss=0.07163, pruned_loss=0.01054, audio_tagging_loss=0.008686, over 14642.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09003, pruned_loss=0.01224, audio_tagging_loss=0.00892, over 3036198.94 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:11:41,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3457173.3333333335, ans=0.09899494936611666 2023-11-28 10:11:45,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3457173.3333333335, ans=0.125 2023-11-28 10:11:48,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3457240.0, ans=0.125 2023-11-28 10:11:48,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3457240.0, ans=0.04949747468305833 2023-11-28 10:12:03,085 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518600 2023-11-28 10:12:13,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3457373.3333333335, ans=0.2 2023-11-28 10:12:14,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3457373.3333333335, ans=0.0 2023-11-28 10:12:32,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3457440.0, ans=0.2 2023-11-28 10:12:35,070 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.552e+01 8.956e+01 9.382e+01 1.022e+02 1.472e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 10:12:35,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3457506.6666666665, ans=0.125 2023-11-28 10:12:36,235 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1600, loss[loss=0.06473, simple_loss=0.09288, pruned_loss=0.008903, audio_tagging_loss=0.009382, over 15637.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09045, pruned_loss=0.01235, audio_tagging_loss=0.008996, over 3042597.31 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:12:37,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3457506.6666666665, ans=0.1 2023-11-28 10:12:38,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3457506.6666666665, ans=0.125 2023-11-28 10:12:40,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3457506.6666666665, ans=0.125 2023-11-28 10:12:50,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3457573.3333333335, ans=0.0 2023-11-28 10:12:57,558 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.54 vs. limit=22.5 2023-11-28 10:13:01,294 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518650 2023-11-28 10:13:02,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3457640.0, ans=0.1 2023-11-28 10:13:03,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.73 vs. limit=6.0 2023-11-28 10:13:27,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3457773.3333333335, ans=0.125 2023-11-28 10:13:33,623 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1650, loss[loss=0.05447, simple_loss=0.07176, pruned_loss=0.007848, audio_tagging_loss=0.01074, over 16040.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09059, pruned_loss=0.0123, audio_tagging_loss=0.009042, over 3041876.13 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:13:34,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3457840.0, ans=0.0 2023-11-28 10:13:39,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3457840.0, ans=0.125 2023-11-28 10:13:58,871 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518700 2023-11-28 10:14:06,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3457973.3333333335, ans=0.0 2023-11-28 10:14:08,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3458040.0, ans=0.125 2023-11-28 10:14:20,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3458106.6666666665, ans=0.1 2023-11-28 10:14:30,052 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 8.751e+01 9.360e+01 1.005e+02 1.461e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 10:14:31,136 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1700, loss[loss=0.05875, simple_loss=0.08287, pruned_loss=0.006908, audio_tagging_loss=0.0104, over 14271.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09028, pruned_loss=0.01229, audio_tagging_loss=0.008933, over 3038260.96 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:14:35,171 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.70 vs. limit=15.0 2023-11-28 10:14:44,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3458240.0, ans=0.125 2023-11-28 10:14:46,860 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:14:56,373 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518750 2023-11-28 10:15:17,947 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.66 vs. limit=22.5 2023-11-28 10:15:28,838 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1750, loss[loss=0.05229, simple_loss=0.06721, pruned_loss=0.009231, audio_tagging_loss=0.009458, over 15128.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08907, pruned_loss=0.01208, audio_tagging_loss=0.008864, over 3040216.65 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:15:42,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3458573.3333333335, ans=0.125 2023-11-28 10:15:46,652 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=22.5 2023-11-28 10:15:47,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3458573.3333333335, ans=0.125 2023-11-28 10:15:54,027 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518800 2023-11-28 10:16:04,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3458706.6666666665, ans=0.125 2023-11-28 10:16:15,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3458773.3333333335, ans=0.025 2023-11-28 10:16:15,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3458773.3333333335, ans=0.0 2023-11-28 10:16:25,426 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 8.578e+01 9.174e+01 9.766e+01 1.256e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-28 10:16:25,453 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1800, loss[loss=0.06533, simple_loss=0.09866, pruned_loss=0.008838, audio_tagging_loss=0.007158, over 15961.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08987, pruned_loss=0.01214, audio_tagging_loss=0.008772, over 3047560.29 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:16:35,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3458840.0, ans=0.0 2023-11-28 10:16:39,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3458906.6666666665, ans=0.5 2023-11-28 10:16:47,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-11-28 10:16:50,459 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518850 2023-11-28 10:16:51,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3458973.3333333335, ans=0.125 2023-11-28 10:17:01,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3459040.0, ans=0.025 2023-11-28 10:17:23,167 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1850, loss[loss=0.07464, simple_loss=0.09849, pruned_loss=0.01676, audio_tagging_loss=0.008635, over 15236.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09047, pruned_loss=0.01228, audio_tagging_loss=0.008685, over 3047236.38 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:17:44,054 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2023-11-28 10:17:45,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3459306.6666666665, ans=0.0 2023-11-28 10:17:47,721 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518900 2023-11-28 10:17:55,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3459306.6666666665, ans=0.0 2023-11-28 10:18:16,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3459440.0, ans=0.125 2023-11-28 10:18:19,426 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.665e+01 9.197e+01 1.005e+02 1.247e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-28 10:18:19,452 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1900, loss[loss=0.06527, simple_loss=0.09318, pruned_loss=0.01287, audio_tagging_loss=0.005807, over 15841.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09011, pruned_loss=0.01223, audio_tagging_loss=0.008577, over 3045830.77 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:18:35,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.92 vs. limit=15.0 2023-11-28 10:18:45,653 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518950 2023-11-28 10:18:46,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3459640.0, ans=0.125 2023-11-28 10:18:53,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3459706.6666666665, ans=0.125 2023-11-28 10:18:53,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3459706.6666666665, ans=0.125 2023-11-28 10:18:53,761 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.06 vs. limit=22.5 2023-11-28 10:18:59,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3459706.6666666665, ans=0.125 2023-11-28 10:19:12,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3459773.3333333335, ans=0.1 2023-11-28 10:19:16,892 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1950, loss[loss=0.06031, simple_loss=0.08933, pruned_loss=0.009376, audio_tagging_loss=0.006268, over 16093.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08981, pruned_loss=0.01212, audio_tagging_loss=0.008621, over 3046540.97 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:19:27,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3459906.6666666665, ans=0.0 2023-11-28 10:19:32,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3459906.6666666665, ans=0.0 2023-11-28 10:19:41,743 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519000 2023-11-28 10:19:43,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3459973.3333333335, ans=0.125 2023-11-28 10:20:10,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3460106.6666666665, ans=0.1 2023-11-28 10:20:14,531 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.984e+01 9.500e+01 1.035e+02 1.289e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 10:20:14,558 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2000, loss[loss=0.06595, simple_loss=0.08831, pruned_loss=0.01335, audio_tagging_loss=0.008446, over 14308.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09003, pruned_loss=0.0122, audio_tagging_loss=0.008692, over 3047496.65 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:20:18,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3460173.3333333335, ans=0.125 2023-11-28 10:20:20,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3460173.3333333335, ans=0.1 2023-11-28 10:20:38,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3460306.6666666665, ans=0.2 2023-11-28 10:20:39,446 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519050 2023-11-28 10:20:58,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3460373.3333333335, ans=0.1 2023-11-28 10:21:09,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3460440.0, ans=0.125 2023-11-28 10:21:10,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3460506.6666666665, ans=0.1 2023-11-28 10:21:11,328 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2050, loss[loss=0.04619, simple_loss=0.05835, pruned_loss=0.01029, audio_tagging_loss=0.006724, over 14402.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08966, pruned_loss=0.01216, audio_tagging_loss=0.008618, over 3048029.35 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:21:23,522 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.38 vs. limit=22.5 2023-11-28 10:21:35,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3460640.0, ans=0.125 2023-11-28 10:21:38,249 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519100 2023-11-28 10:21:46,391 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.79 vs. limit=10.0 2023-11-28 10:21:56,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3460706.6666666665, ans=0.125 2023-11-28 10:22:09,702 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2100, loss[loss=0.07008, simple_loss=0.08737, pruned_loss=0.01629, audio_tagging_loss=0.01011, over 15296.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08994, pruned_loss=0.01232, audio_tagging_loss=0.008563, over 3044132.88 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:22:10,761 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.712e+01 8.721e+01 9.366e+01 1.002e+02 1.628e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 10:22:28,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3460906.6666666665, ans=0.0 2023-11-28 10:22:30,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3460906.6666666665, ans=0.0 2023-11-28 10:22:35,456 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519150 2023-11-28 10:23:04,068 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2023-11-28 10:23:06,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3461106.6666666665, ans=0.1 2023-11-28 10:23:08,711 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2150, loss[loss=0.05036, simple_loss=0.06718, pruned_loss=0.006027, audio_tagging_loss=0.01074, over 15202.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08963, pruned_loss=0.01233, audio_tagging_loss=0.008586, over 3045724.86 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:23:20,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3461240.0, ans=0.125 2023-11-28 10:23:21,308 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:23:21,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3461240.0, ans=0.0 2023-11-28 10:23:31,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3461306.6666666665, ans=0.125 2023-11-28 10:23:33,911 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519200 2023-11-28 10:23:38,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3461306.6666666665, ans=0.125 2023-11-28 10:23:48,067 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:24:06,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3461506.6666666665, ans=0.125 2023-11-28 10:24:07,075 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2200, loss[loss=0.06217, simple_loss=0.08387, pruned_loss=0.009353, audio_tagging_loss=0.01088, over 13938.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09071, pruned_loss=0.01241, audio_tagging_loss=0.00859, over 3043610.26 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:24:08,086 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.940e+01 9.417e+01 1.003e+02 1.474e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 10:24:18,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3461573.3333333335, ans=0.125 2023-11-28 10:24:29,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3461640.0, ans=0.1 2023-11-28 10:24:33,001 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519250 2023-11-28 10:25:04,104 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2250, loss[loss=0.07268, simple_loss=0.1006, pruned_loss=0.01236, audio_tagging_loss=0.01005, over 15176.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09017, pruned_loss=0.01233, audio_tagging_loss=0.008614, over 3035346.62 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:25:04,776 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.49 vs. limit=15.0 2023-11-28 10:25:17,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3461906.6666666665, ans=0.1 2023-11-28 10:25:19,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3461906.6666666665, ans=0.1 2023-11-28 10:25:29,750 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519300 2023-11-28 10:25:31,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3461973.3333333335, ans=0.125 2023-11-28 10:25:33,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3461973.3333333335, ans=0.125 2023-11-28 10:25:44,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3462040.0, ans=0.125 2023-11-28 10:25:58,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.11 vs. limit=15.0 2023-11-28 10:25:59,327 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:26:01,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=22.5 2023-11-28 10:26:02,947 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2300, loss[loss=0.05004, simple_loss=0.06476, pruned_loss=0.007115, audio_tagging_loss=0.01055, over 13317.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09047, pruned_loss=0.01242, audio_tagging_loss=0.008715, over 3036704.69 frames. ], batch size: 52, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:26:03,546 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.50 vs. limit=22.5 2023-11-28 10:26:04,005 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.619e+01 8.792e+01 9.298e+01 1.006e+02 1.302e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-28 10:26:08,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3462173.3333333335, ans=0.125 2023-11-28 10:26:28,113 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519350 2023-11-28 10:26:33,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3462306.6666666665, ans=0.0 2023-11-28 10:26:39,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3462373.3333333335, ans=0.0 2023-11-28 10:26:40,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3462373.3333333335, ans=0.125 2023-11-28 10:26:54,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3462440.0, ans=0.125 2023-11-28 10:26:55,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3462440.0, ans=0.125 2023-11-28 10:26:56,140 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:27:00,537 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2350, loss[loss=0.04927, simple_loss=0.05404, pruned_loss=0.01206, audio_tagging_loss=0.01019, over 13547.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08938, pruned_loss=0.01229, audio_tagging_loss=0.008781, over 3033139.52 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:27:04,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3462506.6666666665, ans=0.125 2023-11-28 10:27:15,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3462573.3333333335, ans=0.0 2023-11-28 10:27:25,733 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519400 2023-11-28 10:27:28,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3462640.0, ans=0.2 2023-11-28 10:27:29,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3462640.0, ans=0.1 2023-11-28 10:27:31,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3462640.0, ans=0.2 2023-11-28 10:27:59,281 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2400, loss[loss=0.05862, simple_loss=0.08073, pruned_loss=0.007341, audio_tagging_loss=0.01091, over 15105.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08987, pruned_loss=0.01239, audio_tagging_loss=0.008833, over 3032010.80 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:28:00,341 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.676e+01 9.385e+01 1.010e+02 1.342e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 10:28:25,718 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519450 2023-11-28 10:28:25,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3462973.3333333335, ans=0.07 2023-11-28 10:28:26,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3462973.3333333335, ans=0.2 2023-11-28 10:28:31,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3462973.3333333335, ans=0.5 2023-11-28 10:28:45,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2023-11-28 10:28:52,492 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2023-11-28 10:28:58,258 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2450, loss[loss=0.0549, simple_loss=0.07415, pruned_loss=0.008514, audio_tagging_loss=0.009311, over 14614.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08977, pruned_loss=0.01225, audio_tagging_loss=0.008959, over 3031221.11 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:29:00,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3463173.3333333335, ans=0.0 2023-11-28 10:29:19,467 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.03 vs. limit=15.0 2023-11-28 10:29:23,762 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519500 2023-11-28 10:29:26,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3463306.6666666665, ans=0.125 2023-11-28 10:29:28,445 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:29:52,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3463440.0, ans=0.125 2023-11-28 10:29:56,339 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2500, loss[loss=0.06697, simple_loss=0.09453, pruned_loss=0.01049, audio_tagging_loss=0.009217, over 16945.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08999, pruned_loss=0.01223, audio_tagging_loss=0.008929, over 3041212.59 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:29:57,383 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.648e+01 9.240e+01 1.001e+02 1.352e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-28 10:30:11,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3463573.3333333335, ans=0.0 2023-11-28 10:30:14,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3463573.3333333335, ans=10.0 2023-11-28 10:30:21,352 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519550 2023-11-28 10:30:49,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3463773.3333333335, ans=0.035 2023-11-28 10:30:54,579 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2550, loss[loss=0.07219, simple_loss=0.1068, pruned_loss=0.01056, audio_tagging_loss=0.008235, over 15231.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08942, pruned_loss=0.01212, audio_tagging_loss=0.008859, over 3040123.62 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:30:57,328 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.51 vs. limit=15.0 2023-11-28 10:31:08,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3463906.6666666665, ans=0.0 2023-11-28 10:31:08,876 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.30 vs. limit=15.0 2023-11-28 10:31:19,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3463973.3333333335, ans=0.07 2023-11-28 10:31:19,988 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519600 2023-11-28 10:31:23,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3463973.3333333335, ans=0.0 2023-11-28 10:31:53,548 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2600, loss[loss=0.06062, simple_loss=0.07772, pruned_loss=0.01231, audio_tagging_loss=0.009456, over 15116.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08991, pruned_loss=0.01209, audio_tagging_loss=0.008685, over 3044855.06 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:31:56,361 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.673e+01 9.368e+01 9.896e+01 1.178e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-28 10:31:59,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3464173.3333333335, ans=0.0 2023-11-28 10:32:04,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3464240.0, ans=0.5 2023-11-28 10:32:06,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3464240.0, ans=22.5 2023-11-28 10:32:07,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3464240.0, ans=0.125 2023-11-28 10:32:11,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3464240.0, ans=0.0 2023-11-28 10:32:19,447 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519650 2023-11-28 10:32:45,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3464440.0, ans=0.125 2023-11-28 10:32:52,186 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2650, loss[loss=0.05378, simple_loss=0.0663, pruned_loss=0.01002, audio_tagging_loss=0.01061, over 14661.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08925, pruned_loss=0.01198, audio_tagging_loss=0.008727, over 3039864.37 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:32:56,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3464506.6666666665, ans=0.1 2023-11-28 10:33:17,821 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519700 2023-11-28 10:33:17,997 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:33:20,422 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.97 vs. limit=12.0 2023-11-28 10:33:25,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3464640.0, ans=10.0 2023-11-28 10:33:29,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3464706.6666666665, ans=10.0 2023-11-28 10:33:50,927 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2700, loss[loss=0.07098, simple_loss=0.09602, pruned_loss=0.0145, audio_tagging_loss=0.008473, over 15294.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08897, pruned_loss=0.01196, audio_tagging_loss=0.008621, over 3038545.25 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:33:54,284 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.512e+01 9.167e+01 9.683e+01 1.022e+02 1.162e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-28 10:33:54,868 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.27 vs. limit=15.0 2023-11-28 10:33:57,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3464840.0, ans=0.125 2023-11-28 10:34:00,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3464840.0, ans=0.125 2023-11-28 10:34:01,133 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:34:11,320 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.78 vs. limit=12.0 2023-11-28 10:34:12,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3464906.6666666665, ans=0.125 2023-11-28 10:34:16,139 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519750 2023-11-28 10:34:19,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=3464973.3333333335, ans=0.1 2023-11-28 10:34:31,630 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.05 vs. limit=15.0 2023-11-28 10:34:40,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3465106.6666666665, ans=0.125 2023-11-28 10:34:48,175 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2750, loss[loss=0.07625, simple_loss=0.1043, pruned_loss=0.01465, audio_tagging_loss=0.009455, over 14923.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08812, pruned_loss=0.01187, audio_tagging_loss=0.008632, over 3033767.73 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:34:56,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3465173.3333333335, ans=0.1 2023-11-28 10:35:09,590 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.15 vs. limit=22.5 2023-11-28 10:35:14,291 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519800 2023-11-28 10:35:16,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3465306.6666666665, ans=0.2 2023-11-28 10:35:27,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3465373.3333333335, ans=0.125 2023-11-28 10:35:36,222 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.43 vs. limit=12.0 2023-11-28 10:35:42,909 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:35:47,361 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2800, loss[loss=0.04499, simple_loss=0.05464, pruned_loss=0.007593, audio_tagging_loss=0.01008, over 17007.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08853, pruned_loss=0.01191, audio_tagging_loss=0.008662, over 3037043.54 frames. ], batch size: 67, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:35:50,659 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.532e+01 9.536e+01 1.008e+02 1.642e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 10:35:51,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.00 vs. limit=22.5 2023-11-28 10:36:05,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3465573.3333333335, ans=0.125 2023-11-28 10:36:12,958 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519850 2023-11-28 10:36:13,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2023-11-28 10:36:17,718 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.49 vs. limit=12.0 2023-11-28 10:36:37,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3465773.3333333335, ans=0.125 2023-11-28 10:36:39,301 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:36:45,210 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2850, loss[loss=0.07431, simple_loss=0.1043, pruned_loss=0.01643, audio_tagging_loss=0.005711, over 15497.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.089, pruned_loss=0.01201, audio_tagging_loss=0.008651, over 3031475.19 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:36:56,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3465906.6666666665, ans=0.125 2023-11-28 10:37:11,147 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519900 2023-11-28 10:37:19,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3466040.0, ans=0.1 2023-11-28 10:37:21,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3466040.0, ans=0.0 2023-11-28 10:37:35,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3466106.6666666665, ans=0.125 2023-11-28 10:37:43,591 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2900, loss[loss=0.0604, simple_loss=0.07836, pruned_loss=0.0102, audio_tagging_loss=0.01101, over 16511.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08942, pruned_loss=0.01212, audio_tagging_loss=0.008676, over 3030037.61 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:37:46,881 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 8.834e+01 9.612e+01 1.019e+02 1.318e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 10:38:01,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3466240.0, ans=0.1 2023-11-28 10:38:04,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3466240.0, ans=0.0 2023-11-28 10:38:09,160 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519950 2023-11-28 10:38:21,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3466373.3333333335, ans=0.0 2023-11-28 10:38:23,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.76 vs. limit=15.0 2023-11-28 10:38:30,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3466440.0, ans=0.125 2023-11-28 10:38:31,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3466440.0, ans=0.2 2023-11-28 10:38:38,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3466440.0, ans=0.2 2023-11-28 10:38:42,313 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2950, loss[loss=0.06935, simple_loss=0.09612, pruned_loss=0.01111, audio_tagging_loss=0.01018, over 14759.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09016, pruned_loss=0.01228, audio_tagging_loss=0.008691, over 3031930.44 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:38:45,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3466506.6666666665, ans=0.0 2023-11-28 10:38:52,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3466573.3333333335, ans=0.0 2023-11-28 10:39:02,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3466573.3333333335, ans=0.125 2023-11-28 10:39:08,016 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520000 2023-11-28 10:39:09,339 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-520000.pt 2023-11-28 10:39:14,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3466640.0, ans=0.0 2023-11-28 10:39:25,745 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:39:42,311 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3000, loss[loss=0.05448, simple_loss=0.06689, pruned_loss=0.008605, audio_tagging_loss=0.01243, over 15094.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08937, pruned_loss=0.01228, audio_tagging_loss=0.008722, over 3035823.16 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:39:42,314 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 10:40:18,167 INFO [train_asr.py:1267] (0/4) Epoch 44, validation: loss=0.05741, simple_loss=0.05054, pruned_loss=0.005252, audio_tagging_loss=0.02689, over 4681554.00 frames. 2023-11-28 10:40:18,167 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 10:40:18,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3466840.0, ans=0.125 2023-11-28 10:40:21,409 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 8.904e+01 9.559e+01 1.030e+02 1.233e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 10:40:42,570 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520050 2023-11-28 10:40:48,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3466973.3333333335, ans=0.1 2023-11-28 10:41:15,705 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3050, loss[loss=0.07873, simple_loss=0.1125, pruned_loss=0.01604, audio_tagging_loss=0.006445, over 15956.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09036, pruned_loss=0.01237, audio_tagging_loss=0.008659, over 3043010.50 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:41:19,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3467173.3333333335, ans=0.04949747468305833 2023-11-28 10:41:26,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3467240.0, ans=0.125 2023-11-28 10:41:41,486 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520100 2023-11-28 10:41:48,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3467306.6666666665, ans=0.2 2023-11-28 10:41:53,539 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:42:02,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3467440.0, ans=0.125 2023-11-28 10:42:13,278 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3100, loss[loss=0.05858, simple_loss=0.07761, pruned_loss=0.0106, audio_tagging_loss=0.009172, over 15622.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09136, pruned_loss=0.01253, audio_tagging_loss=0.008694, over 3046640.53 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:42:16,618 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 8.845e+01 9.349e+01 1.011e+02 1.262e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-28 10:42:40,040 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520150 2023-11-28 10:42:43,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3467640.0, ans=0.125 2023-11-28 10:42:44,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3467640.0, ans=0.0 2023-11-28 10:42:45,102 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.66 vs. limit=15.0 2023-11-28 10:42:50,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3467706.6666666665, ans=0.125 2023-11-28 10:43:05,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3467773.3333333335, ans=0.125 2023-11-28 10:43:11,834 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3150, loss[loss=0.063, simple_loss=0.08044, pruned_loss=0.01191, audio_tagging_loss=0.01087, over 14615.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09144, pruned_loss=0.01246, audio_tagging_loss=0.008779, over 3045726.96 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:43:24,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3467906.6666666665, ans=0.2 2023-11-28 10:43:32,464 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.05 vs. limit=22.5 2023-11-28 10:43:32,527 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.09 vs. limit=22.5 2023-11-28 10:43:37,570 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520200 2023-11-28 10:44:06,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3468106.6666666665, ans=0.0 2023-11-28 10:44:10,825 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3200, loss[loss=0.06681, simple_loss=0.08728, pruned_loss=0.01357, audio_tagging_loss=0.009596, over 15019.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.092, pruned_loss=0.01257, audio_tagging_loss=0.008797, over 3050779.94 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:44:14,067 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.956e+01 8.853e+01 9.488e+01 1.043e+02 1.212e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 10:44:24,057 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:44:35,569 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520250 2023-11-28 10:45:07,162 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3250, loss[loss=0.05326, simple_loss=0.06354, pruned_loss=0.008159, audio_tagging_loss=0.01334, over 14576.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09033, pruned_loss=0.01231, audio_tagging_loss=0.008927, over 3052838.19 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:45:08,838 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2023-11-28 10:45:15,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3468506.6666666665, ans=0.125 2023-11-28 10:45:17,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3468573.3333333335, ans=0.0 2023-11-28 10:45:33,405 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520300 2023-11-28 10:45:40,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3468640.0, ans=0.125 2023-11-28 10:45:41,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.33 vs. limit=15.0 2023-11-28 10:46:05,080 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3300, loss[loss=0.05696, simple_loss=0.07717, pruned_loss=0.008011, audio_tagging_loss=0.01036, over 15904.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09026, pruned_loss=0.01218, audio_tagging_loss=0.008953, over 3053195.60 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:46:08,845 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.715e+01 8.967e+01 9.560e+01 1.010e+02 1.793e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 10:46:10,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3468840.0, ans=0.125 2023-11-28 10:46:13,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3468840.0, ans=0.0 2023-11-28 10:46:30,796 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520350 2023-11-28 10:46:30,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3468973.3333333335, ans=0.2 2023-11-28 10:46:31,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.74 vs. limit=6.0 2023-11-28 10:46:48,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3469040.0, ans=0.125 2023-11-28 10:46:59,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3469106.6666666665, ans=0.0 2023-11-28 10:47:03,683 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3350, loss[loss=0.07481, simple_loss=0.09435, pruned_loss=0.01755, audio_tagging_loss=0.01008, over 15360.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09027, pruned_loss=0.01219, audio_tagging_loss=0.008952, over 3047615.01 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:47:12,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3469173.3333333335, ans=0.2 2023-11-28 10:47:13,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3469240.0, ans=0.125 2023-11-28 10:47:18,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3469240.0, ans=0.0 2023-11-28 10:47:28,686 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520400 2023-11-28 10:47:49,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=3469440.0, ans=0.2 2023-11-28 10:47:50,815 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=15.0 2023-11-28 10:48:01,335 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3400, loss[loss=0.06204, simple_loss=0.08147, pruned_loss=0.01277, audio_tagging_loss=0.008534, over 14900.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09023, pruned_loss=0.01226, audio_tagging_loss=0.008814, over 3046434.98 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:48:05,753 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.926e+01 9.389e+01 1.002e+02 1.280e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 10:48:17,614 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:48:25,595 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.72 vs. limit=22.5 2023-11-28 10:48:27,277 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520450 2023-11-28 10:48:27,812 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.44 vs. limit=15.0 2023-11-28 10:48:28,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3469640.0, ans=0.125 2023-11-28 10:48:37,488 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2023-11-28 10:48:44,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3469706.6666666665, ans=0.2 2023-11-28 10:48:59,554 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3450, loss[loss=0.08131, simple_loss=0.1032, pruned_loss=0.02263, audio_tagging_loss=0.007049, over 14295.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09077, pruned_loss=0.01242, audio_tagging_loss=0.008723, over 3045575.08 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:49:03,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3469840.0, ans=0.0 2023-11-28 10:49:03,426 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.09 vs. limit=15.0 2023-11-28 10:49:16,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3469906.6666666665, ans=0.2 2023-11-28 10:49:25,412 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520500 2023-11-28 10:49:40,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3470040.0, ans=0.0 2023-11-28 10:49:41,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3470040.0, ans=0.125 2023-11-28 10:49:58,039 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3500, loss[loss=0.07911, simple_loss=0.1088, pruned_loss=0.01576, audio_tagging_loss=0.008938, over 14545.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09028, pruned_loss=0.01233, audio_tagging_loss=0.008627, over 3049347.60 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:50:02,338 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 9.047e+01 9.689e+01 1.031e+02 1.305e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-28 10:50:11,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2023-11-28 10:50:23,617 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520550 2023-11-28 10:50:30,291 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:50:45,130 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.02 vs. limit=15.0 2023-11-28 10:50:56,672 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3550, loss[loss=0.07781, simple_loss=0.1046, pruned_loss=0.01716, audio_tagging_loss=0.008327, over 15103.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08956, pruned_loss=0.01231, audio_tagging_loss=0.008707, over 3046750.25 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:51:02,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3470506.6666666665, ans=0.125 2023-11-28 10:51:15,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.37 vs. limit=22.5 2023-11-28 10:51:22,017 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.82 vs. limit=22.5 2023-11-28 10:51:22,617 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520600 2023-11-28 10:51:31,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3470706.6666666665, ans=0.125 2023-11-28 10:51:34,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3470706.6666666665, ans=0.0 2023-11-28 10:51:38,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3470706.6666666665, ans=0.125 2023-11-28 10:51:55,156 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3600, loss[loss=0.06959, simple_loss=0.08738, pruned_loss=0.01617, audio_tagging_loss=0.009732, over 15744.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.088, pruned_loss=0.01206, audio_tagging_loss=0.00868, over 3048464.18 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:52:00,704 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.557e+01 8.694e+01 9.447e+01 1.046e+02 1.297e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 10:52:21,664 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520650 2023-11-28 10:52:21,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3470973.3333333335, ans=0.0 2023-11-28 10:52:54,244 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3650, loss[loss=0.06663, simple_loss=0.09426, pruned_loss=0.01081, audio_tagging_loss=0.008696, over 15694.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08858, pruned_loss=0.01209, audio_tagging_loss=0.008508, over 3046923.34 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:52:58,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3471173.3333333335, ans=0.125 2023-11-28 10:53:10,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3471240.0, ans=0.125 2023-11-28 10:53:14,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.12 vs. limit=15.0 2023-11-28 10:53:19,758 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520700 2023-11-28 10:53:43,102 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.22 vs. limit=6.0 2023-11-28 10:53:43,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3471440.0, ans=0.125 2023-11-28 10:53:52,252 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3700, loss[loss=0.06986, simple_loss=0.09246, pruned_loss=0.01161, audio_tagging_loss=0.01201, over 14743.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08893, pruned_loss=0.01214, audio_tagging_loss=0.00846, over 3051713.22 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:53:59,768 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.224e+01 8.858e+01 9.302e+01 9.977e+01 1.303e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-28 10:54:09,834 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.26 vs. limit=10.0 2023-11-28 10:54:14,311 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2023-11-28 10:54:19,249 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520750 2023-11-28 10:54:21,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3471640.0, ans=0.0 2023-11-28 10:54:26,626 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.16 vs. limit=12.0 2023-11-28 10:54:51,719 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3750, loss[loss=0.0594, simple_loss=0.07949, pruned_loss=0.01026, audio_tagging_loss=0.009404, over 16100.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08966, pruned_loss=0.01228, audio_tagging_loss=0.008575, over 3055929.41 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:55:04,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3471906.6666666665, ans=0.0 2023-11-28 10:55:10,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3471906.6666666665, ans=0.2 2023-11-28 10:55:17,473 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520800 2023-11-28 10:55:24,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3471973.3333333335, ans=0.125 2023-11-28 10:55:35,457 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:55:51,444 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3800, loss[loss=0.04035, simple_loss=0.05173, pruned_loss=0.004212, audio_tagging_loss=0.01027, over 14760.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.0895, pruned_loss=0.01231, audio_tagging_loss=0.008688, over 3050353.58 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:55:58,022 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 9.010e+01 9.587e+01 1.023e+02 1.351e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 10:55:59,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3472173.3333333335, ans=0.05 2023-11-28 10:56:16,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3472306.6666666665, ans=0.0 2023-11-28 10:56:16,948 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520850 2023-11-28 10:56:28,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3472373.3333333335, ans=0.1 2023-11-28 10:56:35,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3472373.3333333335, ans=0.0 2023-11-28 10:56:36,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3472373.3333333335, ans=0.1 2023-11-28 10:56:37,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3472440.0, ans=0.125 2023-11-28 10:56:49,683 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3850, loss[loss=0.07812, simple_loss=0.1074, pruned_loss=0.01521, audio_tagging_loss=0.009211, over 16073.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08936, pruned_loss=0.01229, audio_tagging_loss=0.008839, over 3053677.96 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:56:57,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3472506.6666666665, ans=0.125 2023-11-28 10:57:09,316 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.02 vs. limit=22.5 2023-11-28 10:57:11,840 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.23 vs. limit=15.0 2023-11-28 10:57:15,573 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520900 2023-11-28 10:57:15,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3472640.0, ans=0.0 2023-11-28 10:57:39,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3472773.3333333335, ans=0.125 2023-11-28 10:57:48,657 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3900, loss[loss=0.05781, simple_loss=0.08145, pruned_loss=0.008664, audio_tagging_loss=0.008427, over 15210.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08831, pruned_loss=0.01228, audio_tagging_loss=0.008929, over 3052648.41 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:57:56,096 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.009e+01 8.789e+01 9.361e+01 1.021e+02 3.606e+02, threshold=1.872e+02, percent-clipped=1.0 2023-11-28 10:57:56,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.69 vs. limit=15.0 2023-11-28 10:58:00,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3472906.6666666665, ans=0.125 2023-11-28 10:58:14,626 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520950 2023-11-28 10:58:15,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3472973.3333333335, ans=0.0 2023-11-28 10:58:28,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3473040.0, ans=0.125 2023-11-28 10:58:37,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3473106.6666666665, ans=0.125 2023-11-28 10:58:43,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3473106.6666666665, ans=0.0 2023-11-28 10:58:48,211 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3950, loss[loss=0.05898, simple_loss=0.07278, pruned_loss=0.0118, audio_tagging_loss=0.01079, over 14662.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08786, pruned_loss=0.01214, audio_tagging_loss=0.009014, over 3046100.26 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:58:48,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3473173.3333333335, ans=0.0 2023-11-28 10:58:53,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3473173.3333333335, ans=0.05 2023-11-28 10:58:57,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3473173.3333333335, ans=0.1 2023-11-28 10:59:02,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3473240.0, ans=0.04949747468305833 2023-11-28 10:59:12,837 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521000 2023-11-28 10:59:26,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3473373.3333333335, ans=0.125 2023-11-28 10:59:46,250 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4000, loss[loss=0.05859, simple_loss=0.07286, pruned_loss=0.01219, audio_tagging_loss=0.009968, over 14138.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08803, pruned_loss=0.01202, audio_tagging_loss=0.00906, over 3039154.09 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:59:47,964 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.14 vs. limit=12.0 2023-11-28 10:59:50,184 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2023-11-28 10:59:52,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3473506.6666666665, ans=0.125 2023-11-28 10:59:52,935 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.959e+01 9.483e+01 1.017e+02 1.499e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 11:00:12,075 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521050 2023-11-28 11:00:34,183 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:00:42,444 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.51 vs. limit=22.5 2023-11-28 11:00:44,033 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4050, loss[loss=0.06948, simple_loss=0.09395, pruned_loss=0.01307, audio_tagging_loss=0.009436, over 14690.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08868, pruned_loss=0.01221, audio_tagging_loss=0.009094, over 3040069.03 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:00:50,386 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:01:10,348 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521100 2023-11-28 11:01:19,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3474040.0, ans=0.1 2023-11-28 11:01:20,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3474040.0, ans=0.1 2023-11-28 11:01:34,531 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2023-11-28 11:01:34,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3474106.6666666665, ans=0.125 2023-11-28 11:01:40,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3474106.6666666665, ans=0.125 2023-11-28 11:01:42,740 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4100, loss[loss=0.07716, simple_loss=0.1089, pruned_loss=0.01521, audio_tagging_loss=0.007491, over 13965.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08834, pruned_loss=0.01197, audio_tagging_loss=0.009079, over 3038543.64 frames. ], batch size: 52, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:01:45,699 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:01:49,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3474173.3333333335, ans=0.0 2023-11-28 11:01:51,018 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2023-11-28 11:01:51,406 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 8.779e+01 9.580e+01 1.037e+02 1.315e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-28 11:01:55,436 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.65 vs. limit=22.5 2023-11-28 11:02:08,155 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521150 2023-11-28 11:02:12,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3474306.6666666665, ans=0.125 2023-11-28 11:02:24,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3474373.3333333335, ans=0.0 2023-11-28 11:02:30,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3474440.0, ans=0.125 2023-11-28 11:02:33,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3474440.0, ans=0.2 2023-11-28 11:02:41,703 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4150, loss[loss=0.06974, simple_loss=0.09969, pruned_loss=0.01271, audio_tagging_loss=0.00719, over 15618.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08839, pruned_loss=0.01195, audio_tagging_loss=0.008945, over 3039575.86 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:02:41,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3474506.6666666665, ans=0.125 2023-11-28 11:02:56,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3474573.3333333335, ans=0.125 2023-11-28 11:03:00,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.60 vs. limit=15.0 2023-11-28 11:03:07,959 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521200 2023-11-28 11:03:28,257 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:03:40,523 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4200, loss[loss=0.0546, simple_loss=0.0824, pruned_loss=0.007655, audio_tagging_loss=0.005749, over 15522.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08919, pruned_loss=0.01207, audio_tagging_loss=0.008735, over 3041368.10 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:03:49,028 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.420e+01 8.843e+01 9.445e+01 1.017e+02 1.271e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 11:04:05,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3474973.3333333335, ans=0.125 2023-11-28 11:04:07,619 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521250 2023-11-28 11:04:29,851 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.74 vs. limit=22.5 2023-11-28 11:04:33,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3475106.6666666665, ans=0.0 2023-11-28 11:04:39,680 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4250, loss[loss=0.05402, simple_loss=0.07846, pruned_loss=0.00743, audio_tagging_loss=0.007353, over 14905.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08937, pruned_loss=0.0122, audio_tagging_loss=0.008653, over 3046696.36 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:04:51,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3475240.0, ans=0.0 2023-11-28 11:05:02,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2023-11-28 11:05:03,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3475306.6666666665, ans=0.125 2023-11-28 11:05:05,815 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521300 2023-11-28 11:05:12,989 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=12.0 2023-11-28 11:05:17,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3475373.3333333335, ans=0.2 2023-11-28 11:05:22,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3475373.3333333335, ans=0.125 2023-11-28 11:05:24,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3475373.3333333335, ans=0.0 2023-11-28 11:05:39,413 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4300, loss[loss=0.05503, simple_loss=0.07284, pruned_loss=0.009332, audio_tagging_loss=0.009276, over 15282.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08985, pruned_loss=0.01222, audio_tagging_loss=0.008545, over 3048505.94 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:05:47,120 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.428e+01 8.879e+01 9.468e+01 1.032e+02 1.370e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 11:06:04,302 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521350 2023-11-28 11:06:24,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3475706.6666666665, ans=0.125 2023-11-28 11:06:29,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3475773.3333333335, ans=0.1 2023-11-28 11:06:37,550 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4350, loss[loss=0.07749, simple_loss=0.1108, pruned_loss=0.01543, audio_tagging_loss=0.006668, over 16202.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09014, pruned_loss=0.01227, audio_tagging_loss=0.008515, over 3043203.32 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:06:38,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3475840.0, ans=0.125 2023-11-28 11:07:04,045 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521400 2023-11-28 11:07:34,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3476106.6666666665, ans=0.125 2023-11-28 11:07:34,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3476106.6666666665, ans=0.125 2023-11-28 11:07:36,290 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4400, loss[loss=0.08008, simple_loss=0.1074, pruned_loss=0.01842, audio_tagging_loss=0.007974, over 14592.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09057, pruned_loss=0.01241, audio_tagging_loss=0.008573, over 3045025.23 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:07:44,600 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.862e+01 9.068e+01 9.728e+01 1.034e+02 1.377e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-28 11:08:02,160 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521450 2023-11-28 11:08:11,750 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=22.5 2023-11-28 11:08:17,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3476373.3333333335, ans=0.035 2023-11-28 11:08:17,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3476373.3333333335, ans=0.125 2023-11-28 11:08:18,095 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.27 vs. limit=15.0 2023-11-28 11:08:32,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3476440.0, ans=0.1 2023-11-28 11:08:35,684 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4450, loss[loss=0.07085, simple_loss=0.09033, pruned_loss=0.0123, audio_tagging_loss=0.01338, over 16557.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09044, pruned_loss=0.01243, audio_tagging_loss=0.008561, over 3046928.73 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:08:39,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3476506.6666666665, ans=0.1 2023-11-28 11:08:45,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3476573.3333333335, ans=0.025 2023-11-28 11:08:59,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3476640.0, ans=0.125 2023-11-28 11:09:00,809 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521500 2023-11-28 11:09:06,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3476640.0, ans=0.1 2023-11-28 11:09:21,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3476773.3333333335, ans=0.0 2023-11-28 11:09:33,521 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4500, loss[loss=0.08781, simple_loss=0.1191, pruned_loss=0.01993, audio_tagging_loss=0.008324, over 15564.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09113, pruned_loss=0.01251, audio_tagging_loss=0.008572, over 3053238.34 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:09:41,303 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.590e+01 8.818e+01 9.367e+01 9.979e+01 1.467e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 11:09:42,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.94 vs. limit=15.0 2023-11-28 11:10:00,013 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521550 2023-11-28 11:10:16,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3477040.0, ans=0.1 2023-11-28 11:10:27,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3477106.6666666665, ans=0.2 2023-11-28 11:10:32,143 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4550, loss[loss=0.06182, simple_loss=0.08259, pruned_loss=0.0107, audio_tagging_loss=0.009826, over 15019.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09082, pruned_loss=0.01241, audio_tagging_loss=0.008512, over 3053182.73 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:10:43,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3477240.0, ans=0.125 2023-11-28 11:10:58,568 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521600 2023-11-28 11:11:01,228 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:11:11,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3477373.3333333335, ans=0.125 2023-11-28 11:11:20,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3477440.0, ans=0.035 2023-11-28 11:11:21,741 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:11:31,620 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4600, loss[loss=0.05724, simple_loss=0.07665, pruned_loss=0.007992, audio_tagging_loss=0.01093, over 14388.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09018, pruned_loss=0.01234, audio_tagging_loss=0.008632, over 3042178.82 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:11:38,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3477506.6666666665, ans=0.125 2023-11-28 11:11:39,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3477506.6666666665, ans=0.125 2023-11-28 11:11:39,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3477506.6666666665, ans=0.0 2023-11-28 11:11:39,935 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.011e+01 8.873e+01 9.292e+01 1.017e+02 1.163e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-28 11:11:51,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3477573.3333333335, ans=0.1 2023-11-28 11:11:56,660 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521650 2023-11-28 11:12:29,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3477840.0, ans=0.125 2023-11-28 11:12:30,115 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4650, loss[loss=0.06554, simple_loss=0.08488, pruned_loss=0.01418, audio_tagging_loss=0.008917, over 14348.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08949, pruned_loss=0.01228, audio_tagging_loss=0.008691, over 3039598.49 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:12:55,424 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521700 2023-11-28 11:12:59,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2023-11-28 11:13:06,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3478040.0, ans=0.0 2023-11-28 11:13:28,662 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4700, loss[loss=0.07296, simple_loss=0.1045, pruned_loss=0.0142, audio_tagging_loss=0.006522, over 15387.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08861, pruned_loss=0.01212, audio_tagging_loss=0.008851, over 3039789.00 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:13:36,450 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.974e+01 9.921e+01 1.076e+02 1.441e+02, threshold=1.984e+02, percent-clipped=0.0 2023-11-28 11:13:40,617 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2023-11-28 11:13:45,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3478240.0, ans=0.125 2023-11-28 11:13:49,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3478240.0, ans=0.1 2023-11-28 11:13:49,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3478240.0, ans=0.125 2023-11-28 11:13:49,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3478240.0, ans=0.125 2023-11-28 11:13:55,077 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521750 2023-11-28 11:13:58,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.26 vs. limit=22.5 2023-11-28 11:14:01,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3478306.6666666665, ans=0.125 2023-11-28 11:14:15,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.28 vs. limit=10.0 2023-11-28 11:14:27,483 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4750, loss[loss=0.06849, simple_loss=0.09344, pruned_loss=0.01235, audio_tagging_loss=0.009422, over 16430.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08887, pruned_loss=0.01216, audio_tagging_loss=0.008846, over 3043457.06 frames. ], batch size: 63, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:14:40,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3478573.3333333335, ans=0.0 2023-11-28 11:14:45,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3478573.3333333335, ans=0.125 2023-11-28 11:14:49,887 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.37 vs. limit=15.0 2023-11-28 11:14:52,662 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521800 2023-11-28 11:14:58,093 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-28 11:14:59,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3478640.0, ans=0.125 2023-11-28 11:14:59,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3478640.0, ans=0.1 2023-11-28 11:15:01,890 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2023-11-28 11:15:04,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3478706.6666666665, ans=0.125 2023-11-28 11:15:21,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3478773.3333333335, ans=0.0 2023-11-28 11:15:23,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3478773.3333333335, ans=0.2 2023-11-28 11:15:25,679 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4800, loss[loss=0.06997, simple_loss=0.1052, pruned_loss=0.01083, audio_tagging_loss=0.00655, over 14877.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08843, pruned_loss=0.01199, audio_tagging_loss=0.00892, over 3043436.31 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:15:26,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3478840.0, ans=0.125 2023-11-28 11:15:28,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3478840.0, ans=0.0 2023-11-28 11:15:30,856 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=15.0 2023-11-28 11:15:34,583 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.828e+01 9.577e+01 1.068e+02 1.342e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 11:15:36,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3478906.6666666665, ans=0.125 2023-11-28 11:15:51,084 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521850 2023-11-28 11:16:06,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3479040.0, ans=0.125 2023-11-28 11:16:12,352 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.84 vs. limit=22.5 2023-11-28 11:16:23,916 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4850, loss[loss=0.05274, simple_loss=0.06638, pruned_loss=0.006386, audio_tagging_loss=0.01316, over 14633.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08919, pruned_loss=0.01205, audio_tagging_loss=0.00912, over 3042277.24 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:16:29,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3479173.3333333335, ans=0.125 2023-11-28 11:16:49,862 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521900 2023-11-28 11:16:55,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3479306.6666666665, ans=0.125 2023-11-28 11:17:12,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3479440.0, ans=0.125 2023-11-28 11:17:22,684 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4900, loss[loss=0.07695, simple_loss=0.1064, pruned_loss=0.01683, audio_tagging_loss=0.006925, over 15227.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.0898, pruned_loss=0.01225, audio_tagging_loss=0.008982, over 3037991.08 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:17:32,625 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.438e+01 8.706e+01 9.491e+01 1.021e+02 1.931e+02, threshold=1.898e+02, percent-clipped=1.0 2023-11-28 11:17:36,805 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.95 vs. limit=22.5 2023-11-28 11:17:40,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3479573.3333333335, ans=0.1 2023-11-28 11:17:41,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3479573.3333333335, ans=0.125 2023-11-28 11:17:48,987 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521950 2023-11-28 11:17:51,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3479640.0, ans=0.1 2023-11-28 11:18:15,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3479773.3333333335, ans=0.125 2023-11-28 11:18:21,529 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4950, loss[loss=0.04578, simple_loss=0.05428, pruned_loss=0.008345, audio_tagging_loss=0.0103, over 13937.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08953, pruned_loss=0.01221, audio_tagging_loss=0.008797, over 3038972.91 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:18:47,292 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522000 2023-11-28 11:18:51,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3479973.3333333335, ans=0.125 2023-11-28 11:19:08,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3480106.6666666665, ans=0.2 2023-11-28 11:19:20,162 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5000, loss[loss=0.07725, simple_loss=0.1121, pruned_loss=0.01579, audio_tagging_loss=0.00542, over 15175.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08899, pruned_loss=0.012, audio_tagging_loss=0.008669, over 3037235.16 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:19:29,632 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.223e+01 8.777e+01 9.263e+01 9.841e+01 1.147e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-28 11:19:35,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3480240.0, ans=0.125 2023-11-28 11:19:40,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3480240.0, ans=0.0 2023-11-28 11:19:45,835 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.91 vs. limit=15.0 2023-11-28 11:19:46,479 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522050 2023-11-28 11:19:51,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3480306.6666666665, ans=0.1 2023-11-28 11:20:03,381 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2023-11-28 11:20:06,526 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2023-11-28 11:20:18,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=22.5 2023-11-28 11:20:18,901 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5050, loss[loss=0.0684, simple_loss=0.09606, pruned_loss=0.01156, audio_tagging_loss=0.008808, over 15142.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08862, pruned_loss=0.01201, audio_tagging_loss=0.008608, over 3037885.98 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:20:25,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3480506.6666666665, ans=0.125 2023-11-28 11:20:37,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3480573.3333333335, ans=0.5 2023-11-28 11:20:38,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3480573.3333333335, ans=0.1 2023-11-28 11:20:43,065 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.84 vs. limit=15.0 2023-11-28 11:20:44,588 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522100 2023-11-28 11:20:57,291 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.90 vs. limit=10.0 2023-11-28 11:21:13,852 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=15.0 2023-11-28 11:21:17,565 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5100, loss[loss=0.05039, simple_loss=0.06625, pruned_loss=0.008962, audio_tagging_loss=0.0083, over 16147.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08832, pruned_loss=0.01197, audio_tagging_loss=0.008573, over 3034447.98 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:21:26,394 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.577e+01 8.858e+01 9.488e+01 1.012e+02 1.214e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 11:21:27,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3480906.6666666665, ans=0.125 2023-11-28 11:21:32,796 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.55 vs. limit=22.5 2023-11-28 11:21:35,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3480906.6666666665, ans=0.0 2023-11-28 11:21:39,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3480973.3333333335, ans=0.0 2023-11-28 11:21:43,435 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522150 2023-11-28 11:22:00,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3481040.0, ans=0.125 2023-11-28 11:22:02,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3481040.0, ans=0.0 2023-11-28 11:22:11,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3481106.6666666665, ans=0.2 2023-11-28 11:22:12,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3481106.6666666665, ans=0.125 2023-11-28 11:22:15,699 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5150, loss[loss=0.03386, simple_loss=0.04495, pruned_loss=0.002582, audio_tagging_loss=0.008806, over 14110.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08923, pruned_loss=0.01219, audio_tagging_loss=0.008512, over 3039658.65 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:22:21,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3481173.3333333335, ans=0.125 2023-11-28 11:22:31,730 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:22:42,099 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522200 2023-11-28 11:23:04,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3481440.0, ans=0.125 2023-11-28 11:23:14,772 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5200, loss[loss=0.07794, simple_loss=0.09969, pruned_loss=0.01922, audio_tagging_loss=0.008871, over 16111.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08908, pruned_loss=0.0122, audio_tagging_loss=0.008505, over 3042044.46 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 11:23:20,777 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.33 vs. limit=15.0 2023-11-28 11:23:24,312 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.856e+01 8.751e+01 9.601e+01 1.026e+02 1.242e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 11:23:40,040 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522250 2023-11-28 11:23:58,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3481706.6666666665, ans=0.125 2023-11-28 11:24:12,205 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5250, loss[loss=0.06066, simple_loss=0.08971, pruned_loss=0.009138, audio_tagging_loss=0.006665, over 14671.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.09, pruned_loss=0.01229, audio_tagging_loss=0.008437, over 3039831.18 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:24:37,412 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522300 2023-11-28 11:24:42,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3481973.3333333335, ans=0.125 2023-11-28 11:24:56,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3482040.0, ans=0.125 2023-11-28 11:24:57,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3482106.6666666665, ans=0.125 2023-11-28 11:25:09,477 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5300, loss[loss=0.06481, simple_loss=0.08521, pruned_loss=0.0125, audio_tagging_loss=0.009709, over 13937.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.0903, pruned_loss=0.01238, audio_tagging_loss=0.008434, over 3034392.36 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:25:09,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3482173.3333333335, ans=0.125 2023-11-28 11:25:19,333 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.372e+01 8.992e+01 9.491e+01 1.033e+02 1.599e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 11:25:26,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3482240.0, ans=0.125 2023-11-28 11:25:35,188 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.95 vs. limit=22.5 2023-11-28 11:25:35,840 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522350 2023-11-28 11:25:38,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.95 vs. limit=15.0 2023-11-28 11:25:41,850 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.21 vs. limit=10.0 2023-11-28 11:26:01,812 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-28 11:26:04,905 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:26:07,642 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5350, loss[loss=0.0756, simple_loss=0.1184, pruned_loss=0.01145, audio_tagging_loss=0.004973, over 15173.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.0904, pruned_loss=0.01242, audio_tagging_loss=0.008488, over 3038945.12 frames. ], batch size: 52, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:26:13,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3482506.6666666665, ans=0.0 2023-11-28 11:26:29,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3482573.3333333335, ans=0.0 2023-11-28 11:26:33,717 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522400 2023-11-28 11:26:37,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3482640.0, ans=0.2 2023-11-28 11:27:07,434 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5400, loss[loss=0.05358, simple_loss=0.07247, pruned_loss=0.007604, audio_tagging_loss=0.009739, over 16723.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09029, pruned_loss=0.01218, audio_tagging_loss=0.008509, over 3044701.45 frames. ], batch size: 65, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:27:07,975 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=15.0 2023-11-28 11:27:17,352 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.334e+01 8.830e+01 9.403e+01 1.046e+02 1.380e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-28 11:27:22,671 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.20 vs. limit=15.0 2023-11-28 11:27:31,947 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522450 2023-11-28 11:27:34,591 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.48 vs. limit=15.0 2023-11-28 11:27:39,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3482973.3333333335, ans=0.125 2023-11-28 11:27:40,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3483040.0, ans=0.125 2023-11-28 11:28:05,747 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5450, loss[loss=0.06254, simple_loss=0.08457, pruned_loss=0.01058, audio_tagging_loss=0.009678, over 14008.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08994, pruned_loss=0.01218, audio_tagging_loss=0.008549, over 3042093.32 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:28:32,334 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522500 2023-11-28 11:28:48,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3483373.3333333335, ans=0.0 2023-11-28 11:28:57,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3483440.0, ans=0.0 2023-11-28 11:29:02,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3483440.0, ans=0.125 2023-11-28 11:29:04,430 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5500, loss[loss=0.06157, simple_loss=0.08535, pruned_loss=0.00958, audio_tagging_loss=0.009315, over 14627.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.09002, pruned_loss=0.0121, audio_tagging_loss=0.008625, over 3036441.25 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:29:07,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3483506.6666666665, ans=0.0 2023-11-28 11:29:15,295 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.510e+01 8.610e+01 9.341e+01 1.002e+02 1.177e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 11:29:30,905 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522550 2023-11-28 11:29:38,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3483640.0, ans=0.125 2023-11-28 11:29:40,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3483706.6666666665, ans=0.0 2023-11-28 11:29:52,821 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.57 vs. limit=22.5 2023-11-28 11:30:04,958 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5550, loss[loss=0.08711, simple_loss=0.1194, pruned_loss=0.02094, audio_tagging_loss=0.006462, over 16107.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.0906, pruned_loss=0.01231, audio_tagging_loss=0.008646, over 3038364.83 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:30:20,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3483906.6666666665, ans=0.125 2023-11-28 11:30:24,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3483906.6666666665, ans=0.0 2023-11-28 11:30:29,990 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522600 2023-11-28 11:30:45,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3484040.0, ans=0.125 2023-11-28 11:30:52,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3484106.6666666665, ans=0.0 2023-11-28 11:31:02,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3484106.6666666665, ans=0.125 2023-11-28 11:31:04,157 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5600, loss[loss=0.04457, simple_loss=0.06006, pruned_loss=0.005681, audio_tagging_loss=0.008862, over 15249.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09109, pruned_loss=0.01234, audio_tagging_loss=0.008658, over 3042068.89 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 11:31:14,178 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.656e+01 9.030e+01 9.835e+01 1.064e+02 3.078e+02, threshold=1.967e+02, percent-clipped=1.0 2023-11-28 11:31:29,410 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522650 2023-11-28 11:31:48,641 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.43 vs. limit=5.0 2023-11-28 11:31:51,251 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:32:01,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3484506.6666666665, ans=0.0 2023-11-28 11:32:02,669 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5650, loss[loss=0.06212, simple_loss=0.0851, pruned_loss=0.008358, audio_tagging_loss=0.01121, over 15138.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09075, pruned_loss=0.01228, audio_tagging_loss=0.008796, over 3045034.63 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 11:32:07,263 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2023-11-28 11:32:09,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3484506.6666666665, ans=0.0 2023-11-28 11:32:30,066 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522700 2023-11-28 11:32:48,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3484706.6666666665, ans=0.0 2023-11-28 11:33:00,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3484773.3333333335, ans=0.0 2023-11-28 11:33:02,825 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5700, loss[loss=0.07045, simple_loss=0.09335, pruned_loss=0.01523, audio_tagging_loss=0.008551, over 14569.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09096, pruned_loss=0.01231, audio_tagging_loss=0.008777, over 3042602.77 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:33:11,250 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.80 vs. limit=15.0 2023-11-28 11:33:15,230 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.269e+01 8.782e+01 9.296e+01 1.023e+02 1.172e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-28 11:33:15,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3484906.6666666665, ans=0.0 2023-11-28 11:33:27,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3484973.3333333335, ans=0.125 2023-11-28 11:33:28,838 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522750 2023-11-28 11:33:37,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3485040.0, ans=0.125 2023-11-28 11:33:50,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3485106.6666666665, ans=0.0 2023-11-28 11:33:53,484 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2023-11-28 11:33:59,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=12.0 2023-11-28 11:34:02,526 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5750, loss[loss=0.06063, simple_loss=0.08136, pruned_loss=0.01079, audio_tagging_loss=0.009154, over 15535.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09063, pruned_loss=0.01242, audio_tagging_loss=0.008667, over 3034943.23 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:34:02,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3485173.3333333335, ans=0.2 2023-11-28 11:34:06,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3485173.3333333335, ans=0.2 2023-11-28 11:34:14,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.78 vs. limit=12.0 2023-11-28 11:34:28,180 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522800 2023-11-28 11:34:31,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3485306.6666666665, ans=0.95 2023-11-28 11:34:38,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3485373.3333333335, ans=0.0 2023-11-28 11:35:01,855 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5800, loss[loss=0.0837, simple_loss=0.1243, pruned_loss=0.01707, audio_tagging_loss=0.004475, over 15650.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08984, pruned_loss=0.01227, audio_tagging_loss=0.008581, over 3033520.68 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:35:13,801 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.175e+01 8.794e+01 9.521e+01 1.033e+02 1.295e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 11:35:25,340 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.12 vs. limit=15.0 2023-11-28 11:35:28,292 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522850 2023-11-28 11:35:28,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3485640.0, ans=0.2 2023-11-28 11:35:39,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3485706.6666666665, ans=0.2 2023-11-28 11:35:44,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=15.21 vs. limit=15.0 2023-11-28 11:35:57,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3485773.3333333335, ans=0.1 2023-11-28 11:36:00,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3485840.0, ans=0.2 2023-11-28 11:36:00,894 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5850, loss[loss=0.04614, simple_loss=0.0592, pruned_loss=0.00628, audio_tagging_loss=0.01026, over 14981.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09076, pruned_loss=0.01232, audio_tagging_loss=0.00847, over 3033014.94 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:36:01,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3485840.0, ans=0.5 2023-11-28 11:36:13,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3485906.6666666665, ans=0.1 2023-11-28 11:36:25,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3485973.3333333335, ans=0.2 2023-11-28 11:36:26,659 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522900 2023-11-28 11:36:59,224 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5900, loss[loss=0.05436, simple_loss=0.06457, pruned_loss=0.009027, audio_tagging_loss=0.01305, over 17050.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09104, pruned_loss=0.01237, audio_tagging_loss=0.008488, over 3039310.18 frames. ], batch size: 65, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:37:00,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3486173.3333333335, ans=0.125 2023-11-28 11:37:11,244 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.935e+01 9.645e+01 1.023e+02 1.416e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-28 11:37:19,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3486240.0, ans=0.1 2023-11-28 11:37:25,646 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522950 2023-11-28 11:37:36,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3486373.3333333335, ans=0.125 2023-11-28 11:37:58,767 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5950, loss[loss=0.06547, simple_loss=0.09416, pruned_loss=0.009138, audio_tagging_loss=0.009251, over 14668.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.0911, pruned_loss=0.01241, audio_tagging_loss=0.00849, over 3044995.96 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:37:59,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3486506.6666666665, ans=0.2 2023-11-28 11:38:07,649 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:38:21,176 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.81 vs. limit=6.0 2023-11-28 11:38:23,331 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.81 vs. limit=22.5 2023-11-28 11:38:24,978 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523000 2023-11-28 11:38:33,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3486706.6666666665, ans=0.025 2023-11-28 11:38:39,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3486706.6666666665, ans=0.125 2023-11-28 11:38:41,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=12.0 2023-11-28 11:38:43,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3486706.6666666665, ans=0.2 2023-11-28 11:38:44,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3486706.6666666665, ans=10.0 2023-11-28 11:38:57,779 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6000, loss[loss=0.0431, simple_loss=0.05751, pruned_loss=0.006803, audio_tagging_loss=0.00754, over 15301.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09046, pruned_loss=0.01235, audio_tagging_loss=0.008539, over 3042519.64 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:38:57,781 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 11:39:17,894 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3253, 4.3150, 4.4718, 4.4519], device='cuda:0') 2023-11-28 11:39:33,660 INFO [train_asr.py:1267] (0/4) Epoch 44, validation: loss=0.05792, simple_loss=0.0506, pruned_loss=0.005293, audio_tagging_loss=0.02732, over 4681554.00 frames. 2023-11-28 11:39:33,661 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 11:39:43,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3486840.0, ans=0.125 2023-11-28 11:39:45,293 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.849e+01 9.422e+01 1.008e+02 1.234e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 11:39:47,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3486906.6666666665, ans=0.1 2023-11-28 11:39:59,522 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523050 2023-11-28 11:40:13,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3487040.0, ans=0.125 2023-11-28 11:40:14,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3487040.0, ans=0.0 2023-11-28 11:40:20,243 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:40:23,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3487106.6666666665, ans=0.0 2023-11-28 11:40:28,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3487106.6666666665, ans=0.125 2023-11-28 11:40:31,911 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6050, loss[loss=0.08089, simple_loss=0.1149, pruned_loss=0.01536, audio_tagging_loss=0.008055, over 16385.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.09004, pruned_loss=0.01229, audio_tagging_loss=0.00851, over 3043616.03 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:40:45,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3487240.0, ans=0.0 2023-11-28 11:40:46,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3487240.0, ans=0.125 2023-11-28 11:40:54,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3487240.0, ans=0.125 2023-11-28 11:40:54,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3487240.0, ans=0.1 2023-11-28 11:40:58,476 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523100 2023-11-28 11:41:12,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3487373.3333333335, ans=0.125 2023-11-28 11:41:19,775 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.34 vs. limit=15.0 2023-11-28 11:41:26,373 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.05 vs. limit=15.0 2023-11-28 11:41:31,077 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6100, loss[loss=0.07738, simple_loss=0.1018, pruned_loss=0.01732, audio_tagging_loss=0.009149, over 14550.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09053, pruned_loss=0.01239, audio_tagging_loss=0.008473, over 3044038.71 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:41:37,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3487506.6666666665, ans=0.125 2023-11-28 11:41:43,532 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.905e+01 9.501e+01 1.004e+02 1.216e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 11:41:57,593 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523150 2023-11-28 11:42:05,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.31 vs. limit=22.5 2023-11-28 11:42:08,768 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.30 vs. limit=10.0 2023-11-28 11:42:09,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3487706.6666666665, ans=0.0 2023-11-28 11:42:14,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3487706.6666666665, ans=0.125 2023-11-28 11:42:26,345 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.79 vs. limit=15.0 2023-11-28 11:42:30,247 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6150, loss[loss=0.06126, simple_loss=0.08518, pruned_loss=0.00952, audio_tagging_loss=0.00915, over 14902.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09071, pruned_loss=0.01255, audio_tagging_loss=0.008532, over 3042053.42 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:42:43,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3487906.6666666665, ans=0.0 2023-11-28 11:42:56,300 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523200 2023-11-28 11:43:00,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3487973.3333333335, ans=0.0 2023-11-28 11:43:00,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3487973.3333333335, ans=0.0 2023-11-28 11:43:06,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3488040.0, ans=0.0 2023-11-28 11:43:09,353 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.13 vs. limit=22.5 2023-11-28 11:43:16,024 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.05 vs. limit=15.0 2023-11-28 11:43:18,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3488106.6666666665, ans=0.0 2023-11-28 11:43:28,695 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6200, loss[loss=0.07053, simple_loss=0.08605, pruned_loss=0.01673, audio_tagging_loss=0.01077, over 14701.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08995, pruned_loss=0.01253, audio_tagging_loss=0.008676, over 3048723.81 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:43:39,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3488173.3333333335, ans=0.125 2023-11-28 11:43:42,409 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 8.740e+01 9.407e+01 1.006e+02 1.193e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-28 11:43:48,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3488240.0, ans=0.125 2023-11-28 11:43:56,014 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523250 2023-11-28 11:44:00,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3488306.6666666665, ans=0.125 2023-11-28 11:44:09,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3488373.3333333335, ans=0.125 2023-11-28 11:44:19,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3488440.0, ans=0.2 2023-11-28 11:44:26,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3488440.0, ans=0.125 2023-11-28 11:44:28,613 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6250, loss[loss=0.07194, simple_loss=0.1004, pruned_loss=0.0148, audio_tagging_loss=0.006941, over 14740.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08977, pruned_loss=0.01234, audio_tagging_loss=0.008702, over 3049890.64 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:44:28,943 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:44:37,931 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2023-11-28 11:44:44,684 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2023-11-28 11:44:47,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3488573.3333333335, ans=0.125 2023-11-28 11:44:48,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3488573.3333333335, ans=0.1 2023-11-28 11:44:53,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3488640.0, ans=0.1 2023-11-28 11:44:54,400 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523300 2023-11-28 11:44:56,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3488640.0, ans=0.125 2023-11-28 11:44:59,485 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.08 vs. limit=15.0 2023-11-28 11:45:02,321 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=22.5 2023-11-28 11:45:03,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3488706.6666666665, ans=0.0 2023-11-28 11:45:27,601 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6300, loss[loss=0.1004, simple_loss=0.1437, pruned_loss=0.02288, audio_tagging_loss=0.005731, over 15947.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09088, pruned_loss=0.01241, audio_tagging_loss=0.008674, over 3047347.18 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:45:33,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3488840.0, ans=0.125 2023-11-28 11:45:35,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3488840.0, ans=0.125 2023-11-28 11:45:39,970 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.498e+01 8.790e+01 9.480e+01 1.019e+02 1.243e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 11:45:41,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3488906.6666666665, ans=0.0 2023-11-28 11:45:53,550 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523350 2023-11-28 11:46:05,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3489040.0, ans=0.125 2023-11-28 11:46:25,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.80 vs. limit=15.0 2023-11-28 11:46:25,343 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6350, loss[loss=0.08868, simple_loss=0.1227, pruned_loss=0.0182, audio_tagging_loss=0.009151, over 14598.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09011, pruned_loss=0.01236, audio_tagging_loss=0.008862, over 3040831.22 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:46:44,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3489240.0, ans=0.0 2023-11-28 11:46:49,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3489306.6666666665, ans=0.125 2023-11-28 11:46:51,400 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523400 2023-11-28 11:46:53,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3489306.6666666665, ans=0.125 2023-11-28 11:47:02,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3489373.3333333335, ans=0.125 2023-11-28 11:47:23,871 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6400, loss[loss=0.07284, simple_loss=0.102, pruned_loss=0.01086, audio_tagging_loss=0.01097, over 14948.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08949, pruned_loss=0.01221, audio_tagging_loss=0.008956, over 3038348.98 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:47:26,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3489506.6666666665, ans=0.0 2023-11-28 11:47:28,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3489506.6666666665, ans=0.1 2023-11-28 11:47:36,657 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 8.832e+01 9.473e+01 1.012e+02 1.860e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 11:47:49,037 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523450 2023-11-28 11:47:57,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3489706.6666666665, ans=0.125 2023-11-28 11:48:19,764 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.99 vs. limit=6.0 2023-11-28 11:48:22,442 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6450, loss[loss=0.04905, simple_loss=0.06698, pruned_loss=0.005783, audio_tagging_loss=0.009777, over 14687.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08836, pruned_loss=0.01208, audio_tagging_loss=0.009044, over 3035173.79 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:48:32,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3489906.6666666665, ans=0.125 2023-11-28 11:48:41,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3489906.6666666665, ans=0.2 2023-11-28 11:48:47,560 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523500 2023-11-28 11:48:47,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3489973.3333333335, ans=0.0 2023-11-28 11:48:58,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3490040.0, ans=0.125 2023-11-28 11:49:01,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3490040.0, ans=0.125 2023-11-28 11:49:02,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3490040.0, ans=0.125 2023-11-28 11:49:12,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3490106.6666666665, ans=0.1 2023-11-28 11:49:12,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3490106.6666666665, ans=0.125 2023-11-28 11:49:17,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3490106.6666666665, ans=10.0 2023-11-28 11:49:19,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3490173.3333333335, ans=0.0 2023-11-28 11:49:20,229 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6500, loss[loss=0.07166, simple_loss=0.1015, pruned_loss=0.01543, audio_tagging_loss=0.005499, over 16173.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08807, pruned_loss=0.01202, audio_tagging_loss=0.00897, over 3041976.45 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:49:26,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3490173.3333333335, ans=0.125 2023-11-28 11:49:28,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3490173.3333333335, ans=0.125 2023-11-28 11:49:32,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3490240.0, ans=0.125 2023-11-28 11:49:33,078 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.765e+01 8.967e+01 9.507e+01 1.009e+02 1.264e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 11:49:46,648 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523550 2023-11-28 11:50:14,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3490440.0, ans=0.125 2023-11-28 11:50:18,383 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6550, loss[loss=0.07084, simple_loss=0.1003, pruned_loss=0.01175, audio_tagging_loss=0.008947, over 15084.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08831, pruned_loss=0.01207, audio_tagging_loss=0.008852, over 3051827.67 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:50:20,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-11-28 11:50:29,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3490573.3333333335, ans=0.125 2023-11-28 11:50:33,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3490573.3333333335, ans=0.0 2023-11-28 11:50:40,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3490640.0, ans=0.5 2023-11-28 11:50:43,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3490640.0, ans=0.125 2023-11-28 11:50:44,087 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523600 2023-11-28 11:51:14,563 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0 2023-11-28 11:51:17,304 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6600, loss[loss=0.08888, simple_loss=0.1241, pruned_loss=0.02054, audio_tagging_loss=0.006309, over 15222.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08877, pruned_loss=0.01213, audio_tagging_loss=0.00872, over 3048072.55 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:51:30,534 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.875e+01 9.605e+01 1.016e+02 1.315e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 11:51:33,506 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.69 vs. limit=6.0 2023-11-28 11:51:36,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3490906.6666666665, ans=0.125 2023-11-28 11:51:41,957 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523650 2023-11-28 11:51:53,802 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2023-11-28 11:52:12,326 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=12.0 2023-11-28 11:52:14,944 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6650, loss[loss=0.08851, simple_loss=0.1218, pruned_loss=0.01745, audio_tagging_loss=0.01015, over 15053.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09, pruned_loss=0.01242, audio_tagging_loss=0.008651, over 3045736.13 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:52:15,494 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2023-11-28 11:52:26,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3491240.0, ans=0.125 2023-11-28 11:52:32,799 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.88 vs. limit=15.0 2023-11-28 11:52:41,191 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523700 2023-11-28 11:52:41,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3491306.6666666665, ans=0.2 2023-11-28 11:52:45,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3491306.6666666665, ans=0.125 2023-11-28 11:52:58,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3491373.3333333335, ans=0.125 2023-11-28 11:53:03,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3491440.0, ans=0.125 2023-11-28 11:53:13,448 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6700, loss[loss=0.07744, simple_loss=0.1089, pruned_loss=0.0158, audio_tagging_loss=0.007202, over 15182.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09063, pruned_loss=0.01247, audio_tagging_loss=0.008645, over 3046092.77 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:53:16,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3491506.6666666665, ans=0.125 2023-11-28 11:53:16,316 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-11-28 11:53:28,306 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.754e+01 9.466e+01 1.016e+02 1.269e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 11:53:37,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3491640.0, ans=0.125 2023-11-28 11:53:39,722 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523750 2023-11-28 11:53:48,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3491706.6666666665, ans=0.0 2023-11-28 11:53:51,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3491706.6666666665, ans=0.0 2023-11-28 11:53:55,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3491706.6666666665, ans=0.2 2023-11-28 11:54:06,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3491773.3333333335, ans=0.125 2023-11-28 11:54:12,334 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6750, loss[loss=0.05071, simple_loss=0.06506, pruned_loss=0.00809, audio_tagging_loss=0.01009, over 15401.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09038, pruned_loss=0.01234, audio_tagging_loss=0.008626, over 3044665.90 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:54:14,196 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=22.5 2023-11-28 11:54:22,826 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.24 vs. limit=12.0 2023-11-28 11:54:29,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3491906.6666666665, ans=0.125 2023-11-28 11:54:36,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3491973.3333333335, ans=0.125 2023-11-28 11:54:36,940 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523800 2023-11-28 11:54:49,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3492040.0, ans=0.125 2023-11-28 11:55:09,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3492173.3333333335, ans=0.125 2023-11-28 11:55:10,811 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6800, loss[loss=0.05274, simple_loss=0.061, pruned_loss=0.01174, audio_tagging_loss=0.01051, over 16728.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.0901, pruned_loss=0.01239, audio_tagging_loss=0.008582, over 3044236.97 frames. ], batch size: 64, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:55:11,261 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.78 vs. limit=15.0 2023-11-28 11:55:14,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3492173.3333333335, ans=0.125 2023-11-28 11:55:24,146 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.501e+01 8.914e+01 9.606e+01 1.021e+02 1.348e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 11:55:35,908 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523850 2023-11-28 11:55:54,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3492373.3333333335, ans=0.015 2023-11-28 11:55:56,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3492440.0, ans=0.0 2023-11-28 11:56:09,087 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6850, loss[loss=0.07028, simple_loss=0.09288, pruned_loss=0.01294, audio_tagging_loss=0.0109, over 15570.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.0899, pruned_loss=0.01236, audio_tagging_loss=0.008602, over 3037537.26 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:56:14,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3492506.6666666665, ans=0.1 2023-11-28 11:56:28,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3492573.3333333335, ans=0.125 2023-11-28 11:56:31,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3492573.3333333335, ans=0.0 2023-11-28 11:56:35,871 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523900 2023-11-28 11:56:51,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3492706.6666666665, ans=0.0 2023-11-28 11:56:59,298 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.03 vs. limit=12.0 2023-11-28 11:57:04,263 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:57:07,908 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6900, loss[loss=0.0541, simple_loss=0.06674, pruned_loss=0.009775, audio_tagging_loss=0.01095, over 14714.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.0896, pruned_loss=0.0124, audio_tagging_loss=0.008713, over 3038591.49 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:57:23,410 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.756e+01 9.577e+01 1.062e+02 1.292e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 11:57:33,318 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523950 2023-11-28 11:57:46,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3493040.0, ans=0.5 2023-11-28 11:57:57,185 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:57:59,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3493106.6666666665, ans=0.1 2023-11-28 11:58:06,566 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6950, loss[loss=0.05802, simple_loss=0.07778, pruned_loss=0.01218, audio_tagging_loss=0.006947, over 15856.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08927, pruned_loss=0.01237, audio_tagging_loss=0.00871, over 3030638.09 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:58:06,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3493173.3333333335, ans=0.125 2023-11-28 11:58:11,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3493173.3333333335, ans=0.0 2023-11-28 11:58:12,660 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.94 vs. limit=6.0 2023-11-28 11:58:31,621 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524000 2023-11-28 11:58:33,010 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-524000.pt 2023-11-28 11:58:45,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3493373.3333333335, ans=0.07 2023-11-28 11:58:47,865 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2023-11-28 11:58:55,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3493440.0, ans=0.0 2023-11-28 11:59:06,811 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7000, loss[loss=0.06755, simple_loss=0.09391, pruned_loss=0.011, audio_tagging_loss=0.009601, over 14606.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08948, pruned_loss=0.01225, audio_tagging_loss=0.008721, over 3034043.72 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:59:21,773 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 8.923e+01 9.384e+01 1.033e+02 1.328e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 11:59:29,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3493640.0, ans=0.95 2023-11-28 11:59:32,817 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524050 2023-11-28 11:59:48,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3493706.6666666665, ans=0.125 2023-11-28 11:59:48,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3493706.6666666665, ans=0.125 2023-11-28 11:59:50,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.66 vs. limit=10.0 2023-11-28 12:00:02,398 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2023-11-28 12:00:04,732 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.76 vs. limit=15.0 2023-11-28 12:00:05,227 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7050, loss[loss=0.05008, simple_loss=0.06732, pruned_loss=0.007397, audio_tagging_loss=0.009027, over 15796.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09039, pruned_loss=0.0125, audio_tagging_loss=0.008619, over 3041131.43 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:00:31,306 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524100 2023-11-28 12:00:32,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3493973.3333333335, ans=0.125 2023-11-28 12:00:33,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3493973.3333333335, ans=0.0 2023-11-28 12:00:47,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3494040.0, ans=0.0 2023-11-28 12:00:51,230 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.92 vs. limit=15.0 2023-11-28 12:00:55,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3494106.6666666665, ans=0.125 2023-11-28 12:01:03,906 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7100, loss[loss=0.05207, simple_loss=0.07625, pruned_loss=0.00586, audio_tagging_loss=0.008089, over 17031.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08967, pruned_loss=0.0122, audio_tagging_loss=0.008673, over 3052229.23 frames. ], batch size: 64, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:01:18,902 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.268e+01 8.879e+01 9.631e+01 1.062e+02 1.360e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-28 12:01:29,499 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524150 2023-11-28 12:01:36,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3494306.6666666665, ans=0.125 2023-11-28 12:01:38,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3494373.3333333335, ans=0.0 2023-11-28 12:01:39,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3494373.3333333335, ans=0.1 2023-11-28 12:01:53,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3494440.0, ans=0.1 2023-11-28 12:01:53,480 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-11-28 12:02:01,617 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7150, loss[loss=0.0644, simple_loss=0.08941, pruned_loss=0.01324, audio_tagging_loss=0.00646, over 17971.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08988, pruned_loss=0.01226, audio_tagging_loss=0.008684, over 3054456.45 frames. ], batch size: 69, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:02:06,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3494506.6666666665, ans=0.0 2023-11-28 12:02:06,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3494506.6666666665, ans=0.125 2023-11-28 12:02:11,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3494506.6666666665, ans=0.125 2023-11-28 12:02:27,459 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524200 2023-11-28 12:02:33,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3494640.0, ans=0.125 2023-11-28 12:02:45,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3494706.6666666665, ans=0.05 2023-11-28 12:02:59,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3494840.0, ans=0.125 2023-11-28 12:03:00,046 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7200, loss[loss=0.04584, simple_loss=0.054, pruned_loss=0.009293, audio_tagging_loss=0.009549, over 14341.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08931, pruned_loss=0.01218, audio_tagging_loss=0.008818, over 3047703.18 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:03:03,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3494840.0, ans=0.0 2023-11-28 12:03:06,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3494840.0, ans=0.125 2023-11-28 12:03:09,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3494840.0, ans=0.125 2023-11-28 12:03:15,016 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 8.816e+01 9.709e+01 1.018e+02 1.271e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 12:03:16,976 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.09 vs. limit=15.0 2023-11-28 12:03:19,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3494906.6666666665, ans=0.0 2023-11-28 12:03:24,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3494973.3333333335, ans=0.125 2023-11-28 12:03:25,934 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524250 2023-11-28 12:03:44,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3495040.0, ans=0.0 2023-11-28 12:03:57,751 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7250, loss[loss=0.06907, simple_loss=0.1037, pruned_loss=0.01045, audio_tagging_loss=0.006752, over 16098.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08972, pruned_loss=0.01234, audio_tagging_loss=0.008722, over 3052980.00 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:03:58,461 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.69 vs. limit=22.5 2023-11-28 12:04:05,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3495173.3333333335, ans=0.2 2023-11-28 12:04:10,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3495240.0, ans=0.125 2023-11-28 12:04:16,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3495240.0, ans=0.0 2023-11-28 12:04:23,259 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524300 2023-11-28 12:04:51,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3495440.0, ans=0.125 2023-11-28 12:04:56,016 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7300, loss[loss=0.05637, simple_loss=0.08364, pruned_loss=0.008192, audio_tagging_loss=0.006364, over 16087.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08947, pruned_loss=0.0123, audio_tagging_loss=0.008767, over 3042730.94 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:05:12,053 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.876e+01 8.741e+01 9.294e+01 1.019e+02 1.260e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-28 12:05:17,643 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2023-11-28 12:05:21,879 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524350 2023-11-28 12:05:31,424 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=15.0 2023-11-28 12:05:42,241 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.38 vs. limit=15.0 2023-11-28 12:05:54,091 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7350, loss[loss=0.08273, simple_loss=0.125, pruned_loss=0.01533, audio_tagging_loss=0.004888, over 16826.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08961, pruned_loss=0.01214, audio_tagging_loss=0.008662, over 3044978.48 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:06:09,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3495906.6666666665, ans=0.125 2023-11-28 12:06:19,694 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524400 2023-11-28 12:06:32,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3496040.0, ans=0.125 2023-11-28 12:06:37,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3496040.0, ans=15.0 2023-11-28 12:06:48,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3496106.6666666665, ans=0.2 2023-11-28 12:06:53,738 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7400, loss[loss=0.06101, simple_loss=0.08885, pruned_loss=0.0111, audio_tagging_loss=0.005479, over 14722.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.09005, pruned_loss=0.01219, audio_tagging_loss=0.008556, over 3047421.88 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:07:00,842 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.33 vs. limit=22.5 2023-11-28 12:07:08,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3496240.0, ans=0.125 2023-11-28 12:07:09,011 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.769e+01 9.562e+01 1.042e+02 1.496e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 12:07:10,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3496240.0, ans=0.125 2023-11-28 12:07:19,018 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524450 2023-11-28 12:07:24,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3496306.6666666665, ans=0.125 2023-11-28 12:07:30,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3496373.3333333335, ans=0.0 2023-11-28 12:07:32,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3496373.3333333335, ans=0.0 2023-11-28 12:07:50,864 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7450, loss[loss=0.03146, simple_loss=0.03801, pruned_loss=0.002971, audio_tagging_loss=0.009484, over 13990.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08952, pruned_loss=0.01194, audio_tagging_loss=0.008532, over 3040001.02 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:07:56,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3496506.6666666665, ans=0.125 2023-11-28 12:08:07,936 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.88 vs. limit=12.0 2023-11-28 12:08:17,121 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524500 2023-11-28 12:08:29,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3496706.6666666665, ans=0.5 2023-11-28 12:08:32,152 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.16 vs. limit=15.0 2023-11-28 12:08:35,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3496773.3333333335, ans=0.125 2023-11-28 12:08:47,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3496773.3333333335, ans=0.0 2023-11-28 12:08:49,253 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7500, loss[loss=0.05383, simple_loss=0.06764, pruned_loss=0.01103, audio_tagging_loss=0.008985, over 14812.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08934, pruned_loss=0.01204, audio_tagging_loss=0.008507, over 3044592.48 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:09:04,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3496906.6666666665, ans=0.125 2023-11-28 12:09:05,329 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.587e+01 8.807e+01 9.534e+01 1.017e+02 1.454e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 12:09:06,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3496906.6666666665, ans=0.0 2023-11-28 12:09:14,162 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524550 2023-11-28 12:09:42,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3497106.6666666665, ans=0.035 2023-11-28 12:09:46,905 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7550, loss[loss=0.0773, simple_loss=0.1069, pruned_loss=0.01595, audio_tagging_loss=0.007889, over 16076.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08896, pruned_loss=0.01197, audio_tagging_loss=0.008558, over 3048322.47 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:09:49,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3497173.3333333335, ans=0.125 2023-11-28 12:09:50,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3497173.3333333335, ans=0.2 2023-11-28 12:10:00,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3497240.0, ans=0.1 2023-11-28 12:10:11,103 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524600 2023-11-28 12:10:27,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3497373.3333333335, ans=0.07 2023-11-28 12:10:33,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3497440.0, ans=0.0 2023-11-28 12:10:34,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3497440.0, ans=0.125 2023-11-28 12:10:37,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3497440.0, ans=0.2 2023-11-28 12:10:38,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3497440.0, ans=0.07 2023-11-28 12:10:40,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3497440.0, ans=0.1 2023-11-28 12:10:44,041 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7600, loss[loss=0.06399, simple_loss=0.08818, pruned_loss=0.01202, audio_tagging_loss=0.007873, over 15808.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08857, pruned_loss=0.01187, audio_tagging_loss=0.008527, over 3046191.52 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:10:53,809 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.69 vs. limit=10.0 2023-11-28 12:11:00,298 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.777e+01 8.865e+01 9.544e+01 1.025e+02 1.373e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-28 12:11:09,784 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524650 2023-11-28 12:11:14,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3497640.0, ans=0.04949747468305833 2023-11-28 12:11:19,803 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2023-11-28 12:11:35,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.73 vs. limit=22.5 2023-11-28 12:11:41,839 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7650, loss[loss=0.08366, simple_loss=0.1151, pruned_loss=0.01852, audio_tagging_loss=0.007566, over 15759.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08954, pruned_loss=0.01212, audio_tagging_loss=0.00848, over 3044112.97 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:11:47,818 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.85 vs. limit=15.0 2023-11-28 12:12:00,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3497906.6666666665, ans=0.125 2023-11-28 12:12:03,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3497906.6666666665, ans=0.0 2023-11-28 12:12:08,119 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524700 2023-11-28 12:12:15,326 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=15.0 2023-11-28 12:12:15,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3498040.0, ans=0.0 2023-11-28 12:12:24,998 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.71 vs. limit=15.0 2023-11-28 12:12:31,194 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.14 vs. limit=22.5 2023-11-28 12:12:41,304 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7700, loss[loss=0.04747, simple_loss=0.06745, pruned_loss=0.004617, audio_tagging_loss=0.009127, over 14810.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08845, pruned_loss=0.01194, audio_tagging_loss=0.008556, over 3039550.89 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:12:52,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3498240.0, ans=0.1 2023-11-28 12:12:55,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3498240.0, ans=0.125 2023-11-28 12:12:57,717 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.664e+01 8.890e+01 9.413e+01 1.018e+02 1.310e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 12:13:05,502 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524750 2023-11-28 12:13:07,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3498306.6666666665, ans=0.1 2023-11-28 12:13:16,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3498373.3333333335, ans=0.1 2023-11-28 12:13:22,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3498373.3333333335, ans=0.125 2023-11-28 12:13:38,337 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7750, loss[loss=0.06973, simple_loss=0.09402, pruned_loss=0.01467, audio_tagging_loss=0.00805, over 14303.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08844, pruned_loss=0.01201, audio_tagging_loss=0.008673, over 3030900.77 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:13:41,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3498506.6666666665, ans=0.1 2023-11-28 12:13:46,615 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.71 vs. limit=15.0 2023-11-28 12:13:47,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3498506.6666666665, ans=0.125 2023-11-28 12:13:53,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3498573.3333333335, ans=0.125 2023-11-28 12:13:56,181 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.65 vs. limit=22.5 2023-11-28 12:14:03,977 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524800 2023-11-28 12:14:10,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3498640.0, ans=0.125 2023-11-28 12:14:14,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3498706.6666666665, ans=0.125 2023-11-28 12:14:34,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3498840.0, ans=0.2 2023-11-28 12:14:35,913 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7800, loss[loss=0.06117, simple_loss=0.0869, pruned_loss=0.01053, audio_tagging_loss=0.007182, over 15433.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.0891, pruned_loss=0.01206, audio_tagging_loss=0.008685, over 3028573.73 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:14:49,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3498906.6666666665, ans=0.0 2023-11-28 12:14:53,872 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=22.5 2023-11-28 12:14:54,660 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.957e+01 8.929e+01 9.549e+01 1.038e+02 1.507e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 12:14:59,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3498973.3333333335, ans=0.1 2023-11-28 12:15:02,436 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524850 2023-11-28 12:15:13,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3499040.0, ans=0.2 2023-11-28 12:15:15,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3499040.0, ans=0.1 2023-11-28 12:15:23,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3499106.6666666665, ans=0.125 2023-11-28 12:15:28,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3499106.6666666665, ans=0.125 2023-11-28 12:15:34,939 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7850, loss[loss=0.0424, simple_loss=0.05107, pruned_loss=0.006537, audio_tagging_loss=0.01033, over 15296.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08887, pruned_loss=0.01217, audio_tagging_loss=0.008756, over 3026758.44 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:15:59,618 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524900 2023-11-28 12:16:00,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-28 12:16:16,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.06 vs. limit=15.0 2023-11-28 12:16:18,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3499373.3333333335, ans=0.125 2023-11-28 12:16:27,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3499440.0, ans=0.0 2023-11-28 12:16:30,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3499440.0, ans=0.1 2023-11-28 12:16:32,398 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7900, loss[loss=0.08208, simple_loss=0.1231, pruned_loss=0.01303, audio_tagging_loss=0.007471, over 16464.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08919, pruned_loss=0.01218, audio_tagging_loss=0.008821, over 3035612.42 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:16:32,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3499506.6666666665, ans=0.1 2023-11-28 12:16:34,097 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.63 vs. limit=15.0 2023-11-28 12:16:34,227 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=12.0 2023-11-28 12:16:46,388 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=12.0 2023-11-28 12:16:48,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3499573.3333333335, ans=0.0 2023-11-28 12:16:49,085 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 8.884e+01 9.612e+01 1.005e+02 1.246e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 12:16:49,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3499573.3333333335, ans=0.0 2023-11-28 12:16:49,725 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2023-11-28 12:16:57,356 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524950 2023-11-28 12:17:07,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3499706.6666666665, ans=0.0 2023-11-28 12:17:28,983 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7950, loss[loss=0.06469, simple_loss=0.08762, pruned_loss=0.01259, audio_tagging_loss=0.008279, over 14869.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08927, pruned_loss=0.01221, audio_tagging_loss=0.00889, over 3031973.43 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:17:32,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3499840.0, ans=0.2 2023-11-28 12:17:43,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3499906.6666666665, ans=0.125 2023-11-28 12:17:50,048 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 12:17:53,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3499973.3333333335, ans=0.1 2023-11-28 12:17:55,458 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525000 2023-11-28 12:18:05,264 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.92 vs. limit=22.5 2023-11-28 12:18:10,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3500040.0, ans=0.2 2023-11-28 12:18:27,119 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8000, loss[loss=0.09183, simple_loss=0.1318, pruned_loss=0.01823, audio_tagging_loss=0.007715, over 16165.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08867, pruned_loss=0.01199, audio_tagging_loss=0.008992, over 3035178.73 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:18:29,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3500173.3333333335, ans=0.2 2023-11-28 12:18:45,087 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 8.932e+01 9.362e+01 1.010e+02 1.203e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 12:18:49,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3500306.6666666665, ans=0.125 2023-11-28 12:18:52,818 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525050 2023-11-28 12:18:59,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3500306.6666666665, ans=0.0 2023-11-28 12:19:20,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3500440.0, ans=0.125 2023-11-28 12:19:21,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3500440.0, ans=0.125 2023-11-28 12:19:22,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3500440.0, ans=0.125 2023-11-28 12:19:25,118 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:19:26,000 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8050, loss[loss=0.07021, simple_loss=0.09175, pruned_loss=0.01296, audio_tagging_loss=0.01137, over 15281.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08848, pruned_loss=0.01187, audio_tagging_loss=0.009101, over 3036387.55 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:19:27,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3500506.6666666665, ans=0.125 2023-11-28 12:19:43,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3500573.3333333335, ans=0.125 2023-11-28 12:19:50,930 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525100 2023-11-28 12:20:12,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3500773.3333333335, ans=0.2 2023-11-28 12:20:23,018 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8100, loss[loss=0.05841, simple_loss=0.07604, pruned_loss=0.009505, audio_tagging_loss=0.01089, over 14927.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08828, pruned_loss=0.01179, audio_tagging_loss=0.009039, over 3034680.79 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:20:26,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=3500840.0, ans=0.1 2023-11-28 12:20:40,081 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.823e+01 9.483e+01 1.016e+02 1.288e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 12:20:40,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3500906.6666666665, ans=0.1 2023-11-28 12:20:41,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3500906.6666666665, ans=0.0 2023-11-28 12:20:49,595 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525150 2023-11-28 12:20:52,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3500973.3333333335, ans=0.125 2023-11-28 12:20:57,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3501040.0, ans=0.125 2023-11-28 12:21:08,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3501106.6666666665, ans=0.1 2023-11-28 12:21:18,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3501106.6666666665, ans=0.125 2023-11-28 12:21:20,701 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8150, loss[loss=0.06194, simple_loss=0.08084, pruned_loss=0.01149, audio_tagging_loss=0.01002, over 14741.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08816, pruned_loss=0.01177, audio_tagging_loss=0.008876, over 3036603.48 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:21:30,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3501173.3333333335, ans=0.2 2023-11-28 12:21:34,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3501240.0, ans=0.125 2023-11-28 12:21:41,522 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.24 vs. limit=15.0 2023-11-28 12:21:45,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3501306.6666666665, ans=0.0 2023-11-28 12:21:46,482 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525200 2023-11-28 12:21:50,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3501306.6666666665, ans=0.125 2023-11-28 12:21:52,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3501306.6666666665, ans=0.035 2023-11-28 12:22:18,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3501506.6666666665, ans=0.1 2023-11-28 12:22:19,404 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8200, loss[loss=0.05115, simple_loss=0.06883, pruned_loss=0.008002, audio_tagging_loss=0.008735, over 16494.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.0886, pruned_loss=0.01194, audio_tagging_loss=0.008732, over 3035954.10 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:22:21,891 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-28 12:22:25,434 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 12:22:30,569 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.23 vs. limit=12.0 2023-11-28 12:22:36,394 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.784e+01 8.686e+01 9.334e+01 1.013e+02 1.382e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-28 12:22:44,797 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525250 2023-11-28 12:22:56,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3501706.6666666665, ans=0.0 2023-11-28 12:22:57,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3501706.6666666665, ans=0.035 2023-11-28 12:23:03,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3501706.6666666665, ans=0.125 2023-11-28 12:23:08,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3501773.3333333335, ans=0.0 2023-11-28 12:23:16,946 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8250, loss[loss=0.07887, simple_loss=0.1147, pruned_loss=0.0139, audio_tagging_loss=0.00761, over 15674.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08914, pruned_loss=0.012, audio_tagging_loss=0.008628, over 3041347.31 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:23:22,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3501840.0, ans=0.07 2023-11-28 12:23:42,765 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525300 2023-11-28 12:23:43,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3501973.3333333335, ans=0.125 2023-11-28 12:23:47,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3501973.3333333335, ans=0.0 2023-11-28 12:24:04,820 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.27 vs. limit=12.0 2023-11-28 12:24:07,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3502106.6666666665, ans=0.0 2023-11-28 12:24:14,672 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8300, loss[loss=0.06832, simple_loss=0.08026, pruned_loss=0.01748, audio_tagging_loss=0.01072, over 14798.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08903, pruned_loss=0.01222, audio_tagging_loss=0.008682, over 3042833.68 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:24:16,371 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.62 vs. limit=6.0 2023-11-28 12:24:23,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3502173.3333333335, ans=0.0 2023-11-28 12:24:32,923 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.585e+01 8.723e+01 9.443e+01 1.022e+02 1.604e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 12:24:40,417 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525350 2023-11-28 12:24:40,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3502306.6666666665, ans=0.125 2023-11-28 12:24:48,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3502373.3333333335, ans=0.0 2023-11-28 12:24:58,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3502373.3333333335, ans=0.125 2023-11-28 12:25:02,484 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2023-11-28 12:25:03,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3502440.0, ans=0.1 2023-11-28 12:25:04,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3502440.0, ans=0.035 2023-11-28 12:25:10,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3502440.0, ans=0.0 2023-11-28 12:25:12,870 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8350, loss[loss=0.06944, simple_loss=0.09484, pruned_loss=0.01262, audio_tagging_loss=0.009397, over 15862.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08951, pruned_loss=0.01218, audio_tagging_loss=0.008654, over 3044585.44 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:25:21,702 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.65 vs. limit=15.0 2023-11-28 12:25:34,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3502640.0, ans=0.125 2023-11-28 12:25:37,659 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525400 2023-11-28 12:25:39,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3502640.0, ans=0.2 2023-11-28 12:25:40,896 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.16 vs. limit=15.0 2023-11-28 12:26:10,964 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8400, loss[loss=0.04735, simple_loss=0.05425, pruned_loss=0.008349, audio_tagging_loss=0.01188, over 14759.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08939, pruned_loss=0.01222, audio_tagging_loss=0.00864, over 3049523.88 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:26:20,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3502906.6666666665, ans=0.125 2023-11-28 12:26:29,023 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 8.791e+01 9.357e+01 9.969e+01 1.892e+02, threshold=1.871e+02, percent-clipped=1.0 2023-11-28 12:26:36,655 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525450 2023-11-28 12:26:38,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3502973.3333333335, ans=0.0 2023-11-28 12:26:59,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3503106.6666666665, ans=0.125 2023-11-28 12:26:59,664 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2023-11-28 12:27:01,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3503106.6666666665, ans=0.0 2023-11-28 12:27:07,926 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8450, loss[loss=0.05699, simple_loss=0.0737, pruned_loss=0.01084, audio_tagging_loss=0.009306, over 16026.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08912, pruned_loss=0.01216, audio_tagging_loss=0.008628, over 3051012.26 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:27:21,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3503240.0, ans=0.125 2023-11-28 12:27:21,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3503240.0, ans=0.1 2023-11-28 12:27:33,484 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525500 2023-11-28 12:27:57,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3503440.0, ans=0.0 2023-11-28 12:28:06,300 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8500, loss[loss=0.08125, simple_loss=0.1225, pruned_loss=0.01457, audio_tagging_loss=0.005416, over 15481.00 frames. ], tot_loss[loss=0.065, simple_loss=0.0884, pruned_loss=0.01203, audio_tagging_loss=0.008767, over 3049492.98 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:28:20,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3503573.3333333335, ans=0.0 2023-11-28 12:28:24,291 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.905e+01 8.935e+01 9.397e+01 1.006e+02 1.238e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-28 12:28:31,027 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525550 2023-11-28 12:28:38,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3503640.0, ans=0.125 2023-11-28 12:28:56,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3503773.3333333335, ans=0.0 2023-11-28 12:29:03,085 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8550, loss[loss=0.05369, simple_loss=0.06468, pruned_loss=0.008783, audio_tagging_loss=0.01257, over 15164.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08915, pruned_loss=0.01216, audio_tagging_loss=0.008693, over 3045003.72 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:29:15,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3503906.6666666665, ans=0.0 2023-11-28 12:29:15,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3503906.6666666665, ans=0.07 2023-11-28 12:29:28,510 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525600 2023-11-28 12:29:42,152 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2023-11-28 12:29:53,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3504106.6666666665, ans=0.125 2023-11-28 12:30:00,682 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8600, loss[loss=0.06098, simple_loss=0.08292, pruned_loss=0.01181, audio_tagging_loss=0.007712, over 15416.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08945, pruned_loss=0.01236, audio_tagging_loss=0.008679, over 3043613.34 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:30:00,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3504173.3333333335, ans=0.125 2023-11-28 12:30:06,248 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-11-28 12:30:16,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3504240.0, ans=0.1 2023-11-28 12:30:19,762 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.707e+01 8.911e+01 9.426e+01 1.012e+02 1.309e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-28 12:30:23,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3504306.6666666665, ans=0.125 2023-11-28 12:30:26,593 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525650 2023-11-28 12:30:43,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3504373.3333333335, ans=0.125 2023-11-28 12:30:48,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3504440.0, ans=0.125 2023-11-28 12:30:53,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3504440.0, ans=0.0 2023-11-28 12:30:54,799 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.61 vs. limit=22.5 2023-11-28 12:30:59,391 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8650, loss[loss=0.05916, simple_loss=0.08603, pruned_loss=0.007678, audio_tagging_loss=0.008463, over 15817.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08959, pruned_loss=0.01262, audio_tagging_loss=0.008677, over 3043711.45 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:31:03,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2023-11-28 12:31:19,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3504573.3333333335, ans=0.2 2023-11-28 12:31:24,156 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525700 2023-11-28 12:31:24,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3504640.0, ans=0.1 2023-11-28 12:31:33,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3504706.6666666665, ans=0.125 2023-11-28 12:31:50,374 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.17 vs. limit=15.0 2023-11-28 12:31:56,498 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8700, loss[loss=0.04632, simple_loss=0.05789, pruned_loss=0.009139, audio_tagging_loss=0.00823, over 14156.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09055, pruned_loss=0.01259, audio_tagging_loss=0.008699, over 3043683.41 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:32:14,555 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.280e+01 8.928e+01 9.510e+01 1.026e+02 1.329e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 12:32:21,832 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525750 2023-11-28 12:32:34,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3505040.0, ans=0.0 2023-11-28 12:32:39,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3505040.0, ans=0.125 2023-11-28 12:32:53,015 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8750, loss[loss=0.04892, simple_loss=0.06375, pruned_loss=0.006568, audio_tagging_loss=0.01048, over 15722.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09165, pruned_loss=0.01272, audio_tagging_loss=0.008634, over 3048209.01 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:32:56,635 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.42 vs. limit=15.0 2023-11-28 12:33:10,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3505240.0, ans=0.125 2023-11-28 12:33:19,013 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525800 2023-11-28 12:33:39,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3505440.0, ans=0.125 2023-11-28 12:33:51,365 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8800, loss[loss=0.06042, simple_loss=0.08317, pruned_loss=0.01137, audio_tagging_loss=0.00747, over 14271.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09175, pruned_loss=0.01267, audio_tagging_loss=0.008693, over 3044019.28 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:34:04,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3505573.3333333335, ans=0.0 2023-11-28 12:34:04,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3505573.3333333335, ans=0.0 2023-11-28 12:34:09,431 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.660e+01 8.935e+01 9.643e+01 1.033e+02 1.238e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-28 12:34:16,082 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525850 2023-11-28 12:34:20,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3505640.0, ans=0.0 2023-11-28 12:34:48,668 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8850, loss[loss=0.08857, simple_loss=0.1232, pruned_loss=0.0165, audio_tagging_loss=0.01048, over 15579.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09195, pruned_loss=0.01262, audio_tagging_loss=0.008675, over 3048606.26 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:35:00,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3505906.6666666665, ans=0.1 2023-11-28 12:35:04,104 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 12:35:14,086 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525900 2023-11-28 12:35:19,020 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2023-11-28 12:35:21,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3505973.3333333335, ans=0.125 2023-11-28 12:35:45,388 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8900, loss[loss=0.08918, simple_loss=0.1257, pruned_loss=0.01918, audio_tagging_loss=0.007162, over 15184.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09195, pruned_loss=0.01263, audio_tagging_loss=0.008566, over 3046843.39 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:35:45,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3506173.3333333335, ans=0.1 2023-11-28 12:36:04,591 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 8.621e+01 9.140e+01 9.989e+01 1.171e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-28 12:36:11,332 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525950 2023-11-28 12:36:11,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3506306.6666666665, ans=0.125 2023-11-28 12:36:12,689 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:36:25,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3506373.3333333335, ans=0.125 2023-11-28 12:36:32,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3506440.0, ans=0.0 2023-11-28 12:36:43,088 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8950, loss[loss=0.06558, simple_loss=0.08683, pruned_loss=0.01352, audio_tagging_loss=0.00865, over 15892.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09206, pruned_loss=0.01267, audio_tagging_loss=0.00848, over 3047855.13 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:36:51,024 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.82 vs. limit=15.0 2023-11-28 12:37:07,824 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526000 2023-11-28 12:37:17,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3506706.6666666665, ans=0.125 2023-11-28 12:37:28,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3506773.3333333335, ans=0.125 2023-11-28 12:37:32,208 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.30 vs. limit=22.5 2023-11-28 12:37:40,513 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9000, loss[loss=0.05845, simple_loss=0.07805, pruned_loss=0.009917, audio_tagging_loss=0.009509, over 15573.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.0919, pruned_loss=0.01266, audio_tagging_loss=0.008492, over 3048664.12 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:37:40,516 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 12:38:15,246 INFO [train_asr.py:1267] (0/4) Epoch 44, validation: loss=0.05875, simple_loss=0.05057, pruned_loss=0.005344, audio_tagging_loss=0.02812, over 4681554.00 frames. 2023-11-28 12:38:15,246 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 12:38:17,035 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.44 vs. limit=10.0 2023-11-28 12:38:19,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3506840.0, ans=0.125 2023-11-28 12:38:19,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3506840.0, ans=0.125 2023-11-28 12:38:32,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3506906.6666666665, ans=0.0 2023-11-28 12:38:35,941 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.641e+01 8.830e+01 9.439e+01 1.037e+02 1.240e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 12:38:36,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3506906.6666666665, ans=0.1 2023-11-28 12:38:39,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3506973.3333333335, ans=0.125 2023-11-28 12:38:41,583 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526050 2023-11-28 12:38:42,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3506973.3333333335, ans=0.125 2023-11-28 12:38:46,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3506973.3333333335, ans=0.125 2023-11-28 12:38:53,238 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=15.0 2023-11-28 12:38:56,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3507040.0, ans=0.125 2023-11-28 12:38:59,764 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.11 vs. limit=22.5 2023-11-28 12:39:07,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3507106.6666666665, ans=0.125 2023-11-28 12:39:13,641 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9050, loss[loss=0.08335, simple_loss=0.109, pruned_loss=0.01927, audio_tagging_loss=0.009581, over 14825.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09115, pruned_loss=0.01251, audio_tagging_loss=0.008494, over 3056322.82 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:39:31,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3507240.0, ans=0.125 2023-11-28 12:39:38,923 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526100 2023-11-28 12:39:46,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3507373.3333333335, ans=0.09899494936611666 2023-11-28 12:40:11,479 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9100, loss[loss=0.06258, simple_loss=0.09025, pruned_loss=0.009748, audio_tagging_loss=0.007704, over 15585.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09074, pruned_loss=0.0125, audio_tagging_loss=0.00842, over 3057896.58 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:40:18,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3507506.6666666665, ans=0.125 2023-11-28 12:40:30,768 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 9.014e+01 9.601e+01 1.029e+02 1.341e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 12:40:33,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3507640.0, ans=0.5 2023-11-28 12:40:36,354 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526150 2023-11-28 12:40:40,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3507640.0, ans=0.0 2023-11-28 12:40:41,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3507640.0, ans=0.07 2023-11-28 12:40:52,283 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.61 vs. limit=15.0 2023-11-28 12:41:08,321 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9150, loss[loss=0.06969, simple_loss=0.1018, pruned_loss=0.009941, audio_tagging_loss=0.008848, over 15151.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08948, pruned_loss=0.01219, audio_tagging_loss=0.008485, over 3059629.81 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:41:12,869 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:41:34,217 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526200 2023-11-28 12:41:42,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3508040.0, ans=0.0 2023-11-28 12:41:54,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3508106.6666666665, ans=0.125 2023-11-28 12:42:02,985 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2023-11-28 12:42:05,862 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9200, loss[loss=0.06201, simple_loss=0.08617, pruned_loss=0.01287, audio_tagging_loss=0.006048, over 16044.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08943, pruned_loss=0.01226, audio_tagging_loss=0.008398, over 3063539.35 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:42:14,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3508173.3333333335, ans=0.2 2023-11-28 12:42:24,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3508240.0, ans=0.125 2023-11-28 12:42:25,538 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.609e+01 9.408e+01 1.003e+02 1.431e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 12:42:31,021 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526250 2023-11-28 12:42:35,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3508306.6666666665, ans=0.125 2023-11-28 12:42:47,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3508373.3333333335, ans=0.0 2023-11-28 12:43:02,502 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2023-11-28 12:43:02,874 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9250, loss[loss=0.04821, simple_loss=0.07172, pruned_loss=0.004761, audio_tagging_loss=0.007589, over 14243.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08941, pruned_loss=0.01227, audio_tagging_loss=0.008476, over 3064280.91 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:43:03,582 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.21 vs. limit=12.0 2023-11-28 12:43:14,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3508573.3333333335, ans=0.125 2023-11-28 12:43:27,724 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526300 2023-11-28 12:43:44,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3508706.6666666665, ans=0.2 2023-11-28 12:43:59,851 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9300, loss[loss=0.07134, simple_loss=0.1037, pruned_loss=0.01295, audio_tagging_loss=0.006542, over 14887.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08989, pruned_loss=0.01237, audio_tagging_loss=0.008604, over 3067211.48 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:44:02,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3508840.0, ans=0.2 2023-11-28 12:44:18,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3508906.6666666665, ans=0.0 2023-11-28 12:44:19,098 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 9.008e+01 9.881e+01 1.066e+02 1.464e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-28 12:44:25,832 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526350 2023-11-28 12:44:26,358 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.45 vs. limit=15.0 2023-11-28 12:44:57,016 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9350, loss[loss=0.07392, simple_loss=0.09829, pruned_loss=0.01304, audio_tagging_loss=0.01173, over 16373.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09042, pruned_loss=0.01235, audio_tagging_loss=0.00854, over 3065271.60 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:44:58,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3509173.3333333335, ans=0.0 2023-11-28 12:45:21,594 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2023-11-28 12:45:22,266 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526400 2023-11-28 12:45:23,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3509306.6666666665, ans=10.0 2023-11-28 12:45:24,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3509306.6666666665, ans=0.1 2023-11-28 12:45:38,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3509373.3333333335, ans=0.125 2023-11-28 12:45:55,563 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9400, loss[loss=0.08271, simple_loss=0.1022, pruned_loss=0.02253, audio_tagging_loss=0.009066, over 14510.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09077, pruned_loss=0.01251, audio_tagging_loss=0.008598, over 3058175.38 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:45:55,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3509506.6666666665, ans=0.1 2023-11-28 12:46:04,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3509506.6666666665, ans=0.125 2023-11-28 12:46:09,319 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=12.0 2023-11-28 12:46:14,166 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.941e+01 9.559e+01 9.955e+01 1.222e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 12:46:20,439 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526450 2023-11-28 12:46:22,166 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=15.0 2023-11-28 12:46:23,253 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.76 vs. limit=15.0 2023-11-28 12:46:23,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3509640.0, ans=0.125 2023-11-28 12:46:24,425 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=22.5 2023-11-28 12:46:50,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3509773.3333333335, ans=0.125 2023-11-28 12:46:51,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3509840.0, ans=0.1 2023-11-28 12:46:52,483 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9450, loss[loss=0.0779, simple_loss=0.1096, pruned_loss=0.01342, audio_tagging_loss=0.009657, over 14924.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09104, pruned_loss=0.01246, audio_tagging_loss=0.008691, over 3056036.38 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:46:55,874 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 12:46:56,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3509840.0, ans=0.1 2023-11-28 12:47:05,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3509906.6666666665, ans=0.2 2023-11-28 12:47:18,068 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526500 2023-11-28 12:47:50,095 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9500, loss[loss=0.07355, simple_loss=0.101, pruned_loss=0.01642, audio_tagging_loss=0.00663, over 15313.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09059, pruned_loss=0.01232, audio_tagging_loss=0.008838, over 3058016.42 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:47:57,147 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.34 vs. limit=15.0 2023-11-28 12:47:59,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3510173.3333333335, ans=0.1 2023-11-28 12:48:10,362 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.845e+01 9.672e+01 1.033e+02 1.277e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 12:48:10,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3510240.0, ans=0.1 2023-11-28 12:48:15,943 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526550 2023-11-28 12:48:27,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3510373.3333333335, ans=0.1 2023-11-28 12:48:28,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3510373.3333333335, ans=0.125 2023-11-28 12:48:47,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3510506.6666666665, ans=0.0 2023-11-28 12:48:48,233 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9550, loss[loss=0.07963, simple_loss=0.1131, pruned_loss=0.01362, audio_tagging_loss=0.009438, over 16479.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08977, pruned_loss=0.01216, audio_tagging_loss=0.008899, over 3053375.83 frames. ], batch size: 64, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:48:54,927 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-28 12:48:58,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3510573.3333333335, ans=0.025 2023-11-28 12:49:03,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3510573.3333333335, ans=0.125 2023-11-28 12:49:04,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3510573.3333333335, ans=0.125 2023-11-28 12:49:07,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3510573.3333333335, ans=0.125 2023-11-28 12:49:13,681 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526600 2023-11-28 12:49:21,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3510640.0, ans=0.0 2023-11-28 12:49:26,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3510706.6666666665, ans=0.1 2023-11-28 12:49:33,188 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=22.5 2023-11-28 12:49:45,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3510840.0, ans=0.0 2023-11-28 12:49:46,378 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9600, loss[loss=0.06145, simple_loss=0.08551, pruned_loss=0.01205, audio_tagging_loss=0.006649, over 15328.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09065, pruned_loss=0.01243, audio_tagging_loss=0.008874, over 3053105.98 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:49:55,762 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2023-11-28 12:50:06,851 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.314e+01 8.780e+01 9.691e+01 1.026e+02 1.293e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-28 12:50:08,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3510973.3333333335, ans=0.125 2023-11-28 12:50:11,858 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526650 2023-11-28 12:50:27,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3511040.0, ans=10.0 2023-11-28 12:50:44,463 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9650, loss[loss=0.05968, simple_loss=0.08455, pruned_loss=0.0113, audio_tagging_loss=0.00611, over 14429.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09019, pruned_loss=0.01228, audio_tagging_loss=0.008899, over 3054747.80 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:50:52,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3511173.3333333335, ans=0.2 2023-11-28 12:50:56,673 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.04 vs. limit=15.0 2023-11-28 12:51:10,296 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526700 2023-11-28 12:51:25,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3511373.3333333335, ans=0.125 2023-11-28 12:51:42,591 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9700, loss[loss=0.0484, simple_loss=0.06597, pruned_loss=0.005917, audio_tagging_loss=0.009494, over 16488.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09043, pruned_loss=0.01227, audio_tagging_loss=0.008645, over 3053309.74 frames. ], batch size: 65, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:51:51,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3511506.6666666665, ans=0.1 2023-11-28 12:52:03,074 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.922e+01 8.945e+01 9.456e+01 1.030e+02 1.271e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-28 12:52:05,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=12.0 2023-11-28 12:52:07,551 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526750 2023-11-28 12:52:39,624 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9750, loss[loss=0.03997, simple_loss=0.04937, pruned_loss=0.005656, audio_tagging_loss=0.009631, over 13896.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09056, pruned_loss=0.01226, audio_tagging_loss=0.008558, over 3053540.33 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:52:41,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3511840.0, ans=0.0 2023-11-28 12:52:46,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3511840.0, ans=0.2 2023-11-28 12:52:56,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3511906.6666666665, ans=0.2 2023-11-28 12:53:05,026 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526800 2023-11-28 12:53:20,662 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2023-11-28 12:53:21,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3512040.0, ans=0.0 2023-11-28 12:53:32,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3512106.6666666665, ans=0.1 2023-11-28 12:53:34,794 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.80 vs. limit=6.0 2023-11-28 12:53:37,924 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9800, loss[loss=0.08043, simple_loss=0.1107, pruned_loss=0.01735, audio_tagging_loss=0.007736, over 17096.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09114, pruned_loss=0.01248, audio_tagging_loss=0.008543, over 3050957.68 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:53:38,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3512173.3333333335, ans=0.125 2023-11-28 12:53:47,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3512173.3333333335, ans=0.07 2023-11-28 12:53:58,294 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.970e+01 9.668e+01 1.037e+02 1.358e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 12:54:02,359 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.96 vs. limit=15.0 2023-11-28 12:54:02,868 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526850 2023-11-28 12:54:13,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3512373.3333333335, ans=0.0 2023-11-28 12:54:22,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3512373.3333333335, ans=0.1 2023-11-28 12:54:33,065 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 12:54:33,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3512440.0, ans=0.0 2023-11-28 12:54:34,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3512506.6666666665, ans=0.0 2023-11-28 12:54:35,746 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9850, loss[loss=0.04542, simple_loss=0.05603, pruned_loss=0.006658, audio_tagging_loss=0.01075, over 16880.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09121, pruned_loss=0.01239, audio_tagging_loss=0.008514, over 3050352.65 frames. ], batch size: 66, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:54:50,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3512573.3333333335, ans=0.0 2023-11-28 12:54:56,462 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.69 vs. limit=22.5 2023-11-28 12:55:01,026 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526900 2023-11-28 12:55:12,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3512706.6666666665, ans=0.0 2023-11-28 12:55:12,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3512706.6666666665, ans=0.125 2023-11-28 12:55:33,304 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9900, loss[loss=0.09261, simple_loss=0.1325, pruned_loss=0.01975, audio_tagging_loss=0.006592, over 15641.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09205, pruned_loss=0.01252, audio_tagging_loss=0.008392, over 3052876.55 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:55:41,558 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.04 vs. limit=15.0 2023-11-28 12:55:50,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3512906.6666666665, ans=15.0 2023-11-28 12:55:55,302 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 9.239e+01 9.931e+01 1.065e+02 1.438e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-28 12:55:57,235 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.70 vs. limit=15.0 2023-11-28 12:55:58,692 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526950 2023-11-28 12:56:01,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3512973.3333333335, ans=0.125 2023-11-28 12:56:02,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3512973.3333333335, ans=0.0 2023-11-28 12:56:31,234 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9950, loss[loss=0.078, simple_loss=0.1084, pruned_loss=0.0138, audio_tagging_loss=0.01, over 16426.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09185, pruned_loss=0.01241, audio_tagging_loss=0.008442, over 3056434.18 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:56:31,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3513173.3333333335, ans=0.0 2023-11-28 12:56:36,378 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:56:56,781 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527000 2023-11-28 12:57:08,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3513373.3333333335, ans=0.1 2023-11-28 12:57:20,890 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.19 vs. limit=15.0 2023-11-28 12:57:23,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3513440.0, ans=0.0 2023-11-28 12:57:25,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3513440.0, ans=0.1 2023-11-28 12:57:29,193 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10000, loss[loss=0.05398, simple_loss=0.06953, pruned_loss=0.01038, audio_tagging_loss=0.008838, over 15368.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09131, pruned_loss=0.01243, audio_tagging_loss=0.008462, over 3055754.67 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:57:39,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3513573.3333333335, ans=0.2 2023-11-28 12:57:50,549 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.311e+01 8.894e+01 9.636e+01 1.033e+02 1.186e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 12:57:52,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3513640.0, ans=0.125 2023-11-28 12:57:53,886 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527050 2023-11-28 12:58:05,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3513706.6666666665, ans=0.125 2023-11-28 12:58:17,012 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2023-11-28 12:58:26,163 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10050, loss[loss=0.06203, simple_loss=0.08146, pruned_loss=0.0108, audio_tagging_loss=0.0105, over 14888.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09133, pruned_loss=0.01237, audio_tagging_loss=0.008478, over 3060557.36 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:58:29,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3513840.0, ans=0.125 2023-11-28 12:58:30,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3513840.0, ans=0.125 2023-11-28 12:58:51,670 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527100 2023-11-28 12:59:05,789 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2023-11-28 12:59:22,930 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10100, loss[loss=0.06931, simple_loss=0.09206, pruned_loss=0.01305, audio_tagging_loss=0.01023, over 14781.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09045, pruned_loss=0.0123, audio_tagging_loss=0.008549, over 3059188.93 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:59:46,044 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 8.720e+01 9.623e+01 1.026e+02 1.280e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 12:59:49,354 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527150 2023-11-28 12:59:56,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3514306.6666666665, ans=0.2 2023-11-28 13:00:12,213 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:00:13,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3514440.0, ans=0.0 2023-11-28 13:00:14,189 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:00:19,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3514440.0, ans=0.1 2023-11-28 13:00:21,328 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10150, loss[loss=0.07284, simple_loss=0.1019, pruned_loss=0.01586, audio_tagging_loss=0.006047, over 15267.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08979, pruned_loss=0.01216, audio_tagging_loss=0.00862, over 3051160.32 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:00:26,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3514506.6666666665, ans=0.0 2023-11-28 13:00:30,543 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.73 vs. limit=15.0 2023-11-28 13:00:35,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3514573.3333333335, ans=0.1 2023-11-28 13:00:39,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3514573.3333333335, ans=0.125 2023-11-28 13:00:40,755 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.35 vs. limit=15.0 2023-11-28 13:00:46,710 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527200 2023-11-28 13:00:52,438 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:00:55,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3514706.6666666665, ans=0.1 2023-11-28 13:01:19,640 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10200, loss[loss=0.06274, simple_loss=0.09044, pruned_loss=0.009325, audio_tagging_loss=0.008195, over 15622.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08993, pruned_loss=0.01215, audio_tagging_loss=0.008711, over 3052268.43 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:01:41,255 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.683e+01 9.423e+01 1.021e+02 1.647e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-28 13:01:44,667 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527250 2023-11-28 13:01:45,665 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:01:49,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3514973.3333333335, ans=0.1 2023-11-28 13:01:51,282 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.72 vs. limit=15.0 2023-11-28 13:01:53,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3515040.0, ans=0.0 2023-11-28 13:02:01,127 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.86 vs. limit=15.0 2023-11-28 13:02:16,831 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10250, loss[loss=0.04766, simple_loss=0.05506, pruned_loss=0.01018, audio_tagging_loss=0.009954, over 14933.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08985, pruned_loss=0.01209, audio_tagging_loss=0.008783, over 3055204.70 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:02:33,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3515240.0, ans=0.5 2023-11-28 13:02:34,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3515240.0, ans=0.1 2023-11-28 13:02:36,976 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:02:43,398 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527300 2023-11-28 13:02:49,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3515306.6666666665, ans=0.0 2023-11-28 13:02:55,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3515373.3333333335, ans=0.0 2023-11-28 13:02:59,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3515373.3333333335, ans=0.0 2023-11-28 13:03:05,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3515440.0, ans=0.125 2023-11-28 13:03:08,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3515440.0, ans=0.0 2023-11-28 13:03:14,516 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10300, loss[loss=0.05707, simple_loss=0.07486, pruned_loss=0.01058, audio_tagging_loss=0.009061, over 15454.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.0895, pruned_loss=0.01208, audio_tagging_loss=0.008806, over 3060022.57 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:03:21,409 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.34 vs. limit=15.0 2023-11-28 13:03:23,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3515506.6666666665, ans=0.0 2023-11-28 13:03:29,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3515573.3333333335, ans=0.125 2023-11-28 13:03:33,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3515573.3333333335, ans=0.125 2023-11-28 13:03:36,879 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 8.855e+01 9.649e+01 1.050e+02 1.403e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 13:03:40,211 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527350 2023-11-28 13:03:44,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3515640.0, ans=0.1 2023-11-28 13:04:12,932 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10350, loss[loss=0.0628, simple_loss=0.08049, pruned_loss=0.0105, audio_tagging_loss=0.01206, over 14344.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08886, pruned_loss=0.01199, audio_tagging_loss=0.008941, over 3056084.48 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:04:21,433 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.32 vs. limit=8.0 2023-11-28 13:04:29,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3515906.6666666665, ans=0.0 2023-11-28 13:04:37,780 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527400 2023-11-28 13:04:47,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3516040.0, ans=0.0 2023-11-28 13:05:06,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3516106.6666666665, ans=0.1 2023-11-28 13:05:10,552 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10400, loss[loss=0.07171, simple_loss=0.1008, pruned_loss=0.01389, audio_tagging_loss=0.007405, over 14632.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.0888, pruned_loss=0.01191, audio_tagging_loss=0.009044, over 3048166.44 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:05:14,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3516173.3333333335, ans=0.125 2023-11-28 13:05:18,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3516173.3333333335, ans=0.0 2023-11-28 13:05:25,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3516240.0, ans=0.0 2023-11-28 13:05:27,230 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2023-11-28 13:05:32,159 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 9.002e+01 9.653e+01 1.031e+02 1.825e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-28 13:05:36,670 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527450 2023-11-28 13:05:43,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3516306.6666666665, ans=0.125 2023-11-28 13:06:08,063 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10450, loss[loss=0.06645, simple_loss=0.0928, pruned_loss=0.01265, audio_tagging_loss=0.007408, over 15487.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08922, pruned_loss=0.01195, audio_tagging_loss=0.009034, over 3048135.17 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:06:15,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3516506.6666666665, ans=0.125 2023-11-28 13:06:23,137 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:06:23,398 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.02 vs. limit=15.0 2023-11-28 13:06:33,677 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527500 2023-11-28 13:06:34,144 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.20 vs. limit=15.0 2023-11-28 13:06:46,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3516706.6666666665, ans=15.0 2023-11-28 13:06:47,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3516706.6666666665, ans=0.125 2023-11-28 13:06:54,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3516773.3333333335, ans=0.125 2023-11-28 13:07:00,262 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=15.0 2023-11-28 13:07:06,823 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10500, loss[loss=0.05771, simple_loss=0.08135, pruned_loss=0.01189, audio_tagging_loss=0.005143, over 14234.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08877, pruned_loss=0.01193, audio_tagging_loss=0.008919, over 3043154.91 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:07:17,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3516906.6666666665, ans=0.2 2023-11-28 13:07:28,310 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.731e+01 9.359e+01 1.002e+02 1.371e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 13:07:31,689 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527550 2023-11-28 13:07:39,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3517040.0, ans=0.0 2023-11-28 13:07:48,656 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.70 vs. limit=12.0 2023-11-28 13:08:04,090 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10550, loss[loss=0.07365, simple_loss=0.09303, pruned_loss=0.01782, audio_tagging_loss=0.00932, over 16018.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08911, pruned_loss=0.01217, audio_tagging_loss=0.00878, over 3047124.04 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:08:11,482 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:08:17,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3517240.0, ans=0.125 2023-11-28 13:08:20,411 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2023-11-28 13:08:26,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3517306.6666666665, ans=0.0 2023-11-28 13:08:28,802 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527600 2023-11-28 13:08:28,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3517306.6666666665, ans=0.125 2023-11-28 13:08:37,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3517306.6666666665, ans=0.125 2023-11-28 13:08:48,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3517373.3333333335, ans=0.09899494936611666 2023-11-28 13:09:01,979 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10600, loss[loss=0.08126, simple_loss=0.1175, pruned_loss=0.01658, audio_tagging_loss=0.005912, over 15575.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08858, pruned_loss=0.01218, audio_tagging_loss=0.008829, over 3045691.09 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:09:03,743 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=12.0 2023-11-28 13:09:11,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3517506.6666666665, ans=0.1 2023-11-28 13:09:21,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3517573.3333333335, ans=0.0 2023-11-28 13:09:24,552 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.917e+01 9.595e+01 1.067e+02 1.545e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-28 13:09:27,987 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527650 2023-11-28 13:09:31,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3517640.0, ans=0.125 2023-11-28 13:09:46,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3517706.6666666665, ans=0.0 2023-11-28 13:10:00,407 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10650, loss[loss=0.07439, simple_loss=0.1088, pruned_loss=0.012, audio_tagging_loss=0.007977, over 15093.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.0889, pruned_loss=0.01212, audio_tagging_loss=0.008759, over 3040764.41 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:10:03,103 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.98 vs. limit=15.0 2023-11-28 13:10:13,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3517906.6666666665, ans=0.1 2023-11-28 13:10:25,962 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527700 2023-11-28 13:10:26,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3517973.3333333335, ans=0.1 2023-11-28 13:10:29,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3517973.3333333335, ans=0.125 2023-11-28 13:10:34,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3518040.0, ans=0.0 2023-11-28 13:10:55,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3518106.6666666665, ans=0.125 2023-11-28 13:10:57,982 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10700, loss[loss=0.0707, simple_loss=0.09821, pruned_loss=0.01394, audio_tagging_loss=0.007656, over 16070.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08793, pruned_loss=0.01195, audio_tagging_loss=0.008652, over 3034347.07 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:11:08,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3518240.0, ans=0.07 2023-11-28 13:11:16,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3518240.0, ans=0.125 2023-11-28 13:11:19,353 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.850e+01 8.935e+01 9.512e+01 1.031e+02 1.313e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 13:11:20,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3518306.6666666665, ans=0.2 2023-11-28 13:11:22,817 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527750 2023-11-28 13:11:24,501 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.60 vs. limit=15.0 2023-11-28 13:11:31,967 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=15.0 2023-11-28 13:11:43,020 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.13 vs. limit=15.0 2023-11-28 13:11:49,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3518440.0, ans=0.0 2023-11-28 13:11:55,954 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10750, loss[loss=0.06387, simple_loss=0.09443, pruned_loss=0.007378, audio_tagging_loss=0.00928, over 14551.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08807, pruned_loss=0.012, audio_tagging_loss=0.008646, over 3035667.48 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:11:57,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3518506.6666666665, ans=0.0 2023-11-28 13:11:58,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3518506.6666666665, ans=0.1 2023-11-28 13:12:14,124 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.44 vs. limit=15.0 2023-11-28 13:12:21,176 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527800 2023-11-28 13:12:23,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.21 vs. limit=15.0 2023-11-28 13:12:27,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3518640.0, ans=0.2 2023-11-28 13:12:27,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3518640.0, ans=0.125 2023-11-28 13:12:53,903 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10800, loss[loss=0.05893, simple_loss=0.08163, pruned_loss=0.01064, audio_tagging_loss=0.007475, over 16059.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08888, pruned_loss=0.01213, audio_tagging_loss=0.008613, over 3050071.24 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:13:15,945 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 8.825e+01 9.488e+01 1.009e+02 1.262e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 13:13:19,987 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527850 2023-11-28 13:13:20,421 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.88 vs. limit=10.0 2023-11-28 13:13:22,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2023-11-28 13:13:38,577 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=12.0 2023-11-28 13:13:51,772 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10850, loss[loss=0.05651, simple_loss=0.07853, pruned_loss=0.006731, audio_tagging_loss=0.01051, over 15015.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08947, pruned_loss=0.01217, audio_tagging_loss=0.008572, over 3052453.52 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:14:08,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3519240.0, ans=0.0 2023-11-28 13:14:17,168 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527900 2023-11-28 13:14:25,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3519306.6666666665, ans=0.2 2023-11-28 13:14:31,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3519373.3333333335, ans=0.5 2023-11-28 13:14:50,120 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10900, loss[loss=0.06954, simple_loss=0.08789, pruned_loss=0.01572, audio_tagging_loss=0.009866, over 15207.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08965, pruned_loss=0.01216, audio_tagging_loss=0.008586, over 3053833.57 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:14:51,284 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:15:03,891 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.35 vs. limit=15.0 2023-11-28 13:15:11,872 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.107e+01 8.813e+01 9.594e+01 1.027e+02 1.260e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-28 13:15:15,308 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527950 2023-11-28 13:15:39,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3519773.3333333335, ans=0.05 2023-11-28 13:15:42,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3519773.3333333335, ans=0.0 2023-11-28 13:15:47,463 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10950, loss[loss=0.05836, simple_loss=0.08188, pruned_loss=0.008284, audio_tagging_loss=0.009134, over 15659.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08954, pruned_loss=0.01213, audio_tagging_loss=0.008587, over 3051028.20 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:15:53,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-11-28 13:16:05,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3519906.6666666665, ans=0.125 2023-11-28 13:16:13,457 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528000 2023-11-28 13:16:14,727 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-528000.pt 2023-11-28 13:16:25,196 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=15.0 2023-11-28 13:16:31,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3520040.0, ans=0.0 2023-11-28 13:16:32,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3520040.0, ans=0.125 2023-11-28 13:16:47,527 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11000, loss[loss=0.05649, simple_loss=0.07439, pruned_loss=0.008051, audio_tagging_loss=0.01125, over 16858.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08882, pruned_loss=0.01196, audio_tagging_loss=0.008614, over 3053896.01 frames. ], batch size: 65, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:16:50,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3520173.3333333335, ans=0.125 2023-11-28 13:16:53,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3520173.3333333335, ans=0.07 2023-11-28 13:16:55,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3520173.3333333335, ans=0.125 2023-11-28 13:17:01,807 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:17:05,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3520240.0, ans=0.0 2023-11-28 13:17:05,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3520240.0, ans=0.2 2023-11-28 13:17:08,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3520240.0, ans=0.0 2023-11-28 13:17:09,437 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 9.312e+01 9.741e+01 1.066e+02 1.982e+02, threshold=1.948e+02, percent-clipped=1.0 2023-11-28 13:17:12,879 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528050 2023-11-28 13:17:20,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3520306.6666666665, ans=0.125 2023-11-28 13:17:36,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3520440.0, ans=0.125 2023-11-28 13:17:40,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.52 vs. limit=10.0 2023-11-28 13:17:44,890 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11050, loss[loss=0.05123, simple_loss=0.06479, pruned_loss=0.01028, audio_tagging_loss=0.008563, over 15116.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08892, pruned_loss=0.01197, audio_tagging_loss=0.00872, over 3050948.52 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:17:47,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3520506.6666666665, ans=0.125 2023-11-28 13:17:48,766 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.69 vs. limit=12.0 2023-11-28 13:18:01,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3520573.3333333335, ans=0.125 2023-11-28 13:18:07,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3520640.0, ans=0.04949747468305833 2023-11-28 13:18:10,198 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528100 2023-11-28 13:18:11,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3520640.0, ans=0.0 2023-11-28 13:18:12,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3520640.0, ans=0.0 2023-11-28 13:18:31,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3520773.3333333335, ans=0.0 2023-11-28 13:18:41,365 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11100, loss[loss=0.05816, simple_loss=0.07336, pruned_loss=0.01185, audio_tagging_loss=0.009633, over 15970.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08894, pruned_loss=0.01212, audio_tagging_loss=0.008851, over 3057293.46 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:18:52,992 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.60 vs. limit=22.5 2023-11-28 13:19:04,073 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.992e+01 9.660e+01 1.067e+02 1.331e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 13:19:06,369 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528150 2023-11-28 13:19:22,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3521040.0, ans=0.0 2023-11-28 13:19:39,154 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11150, loss[loss=0.07675, simple_loss=0.1053, pruned_loss=0.01559, audio_tagging_loss=0.008489, over 15633.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08956, pruned_loss=0.01225, audio_tagging_loss=0.008856, over 3049564.67 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:19:39,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3521173.3333333335, ans=0.07 2023-11-28 13:19:42,837 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=15.0 2023-11-28 13:19:56,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3521240.0, ans=0.0 2023-11-28 13:20:00,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3521240.0, ans=0.0 2023-11-28 13:20:01,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3521306.6666666665, ans=0.125 2023-11-28 13:20:04,789 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528200 2023-11-28 13:20:05,268 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.23 vs. limit=15.0 2023-11-28 13:20:07,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3521306.6666666665, ans=0.1 2023-11-28 13:20:28,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3521440.0, ans=0.125 2023-11-28 13:20:36,849 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11200, loss[loss=0.07021, simple_loss=0.08922, pruned_loss=0.01737, audio_tagging_loss=0.008226, over 14392.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.0886, pruned_loss=0.01208, audio_tagging_loss=0.008957, over 3042114.82 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:20:43,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3521506.6666666665, ans=0.125 2023-11-28 13:21:00,912 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.780e+01 8.925e+01 9.638e+01 1.050e+02 1.394e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-28 13:21:03,163 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528250 2023-11-28 13:21:14,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3521706.6666666665, ans=10.0 2023-11-28 13:21:30,163 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.11 vs. limit=15.0 2023-11-28 13:21:35,062 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11250, loss[loss=0.05706, simple_loss=0.08193, pruned_loss=0.008877, audio_tagging_loss=0.007216, over 15149.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08793, pruned_loss=0.01198, audio_tagging_loss=0.008961, over 3047927.63 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:21:39,902 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2023-11-28 13:21:45,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3521840.0, ans=0.0 2023-11-28 13:22:00,244 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528300 2023-11-28 13:22:28,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3522106.6666666665, ans=0.125 2023-11-28 13:22:30,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3522106.6666666665, ans=0.07 2023-11-28 13:22:33,040 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11300, loss[loss=0.0592, simple_loss=0.07548, pruned_loss=0.01288, audio_tagging_loss=0.008577, over 14398.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08835, pruned_loss=0.01196, audio_tagging_loss=0.008812, over 3046736.49 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:22:37,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.21 vs. limit=15.0 2023-11-28 13:22:38,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3522173.3333333335, ans=0.07 2023-11-28 13:22:41,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3522173.3333333335, ans=0.125 2023-11-28 13:22:46,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3522240.0, ans=0.125 2023-11-28 13:22:53,178 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2023-11-28 13:22:57,896 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.792e+01 8.854e+01 9.489e+01 1.005e+02 1.713e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 13:22:57,998 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528350 2023-11-28 13:23:10,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3522373.3333333335, ans=0.125 2023-11-28 13:23:18,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3522440.0, ans=0.2 2023-11-28 13:23:19,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3522440.0, ans=0.0 2023-11-28 13:23:30,099 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11350, loss[loss=0.06299, simple_loss=0.08485, pruned_loss=0.01135, audio_tagging_loss=0.009217, over 14851.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08839, pruned_loss=0.01203, audio_tagging_loss=0.008671, over 3048632.61 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 8.0 2023-11-28 13:23:39,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3522506.6666666665, ans=0.0 2023-11-28 13:23:43,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3522573.3333333335, ans=0.125 2023-11-28 13:23:48,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3522573.3333333335, ans=0.2 2023-11-28 13:23:50,784 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.88 vs. limit=15.0 2023-11-28 13:23:51,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3522573.3333333335, ans=0.0 2023-11-28 13:23:56,606 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528400 2023-11-28 13:24:28,358 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11400, loss[loss=0.07754, simple_loss=0.1124, pruned_loss=0.01526, audio_tagging_loss=0.006072, over 14877.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08916, pruned_loss=0.01226, audio_tagging_loss=0.008596, over 3038475.88 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 8.0 2023-11-28 13:24:39,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3522906.6666666665, ans=0.125 2023-11-28 13:24:54,039 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.279e+01 9.209e+01 9.932e+01 1.057e+02 3.089e+02, threshold=1.986e+02, percent-clipped=1.0 2023-11-28 13:24:54,140 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528450 2023-11-28 13:24:57,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3522973.3333333335, ans=0.1 2023-11-28 13:25:03,291 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:25:05,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3523040.0, ans=0.125 2023-11-28 13:25:27,090 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11450, loss[loss=0.06784, simple_loss=0.1036, pruned_loss=0.01019, audio_tagging_loss=0.005866, over 14958.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08897, pruned_loss=0.01212, audio_tagging_loss=0.008519, over 3042515.27 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 8.0 2023-11-28 13:25:41,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3523240.0, ans=0.125 2023-11-28 13:25:50,115 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.79 vs. limit=15.0 2023-11-28 13:25:51,788 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528500 2023-11-28 13:25:54,555 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.33 vs. limit=15.0 2023-11-28 13:26:06,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3523373.3333333335, ans=0.0 2023-11-28 13:26:07,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3523373.3333333335, ans=0.0 2023-11-28 13:26:12,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3523440.0, ans=0.125 2023-11-28 13:26:16,989 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2023-11-28 13:26:24,138 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11500, loss[loss=0.06396, simple_loss=0.08846, pruned_loss=0.01293, audio_tagging_loss=0.006811, over 14969.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08878, pruned_loss=0.01214, audio_tagging_loss=0.008548, over 3043991.17 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 8.0 2023-11-28 13:26:31,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3523506.6666666665, ans=0.04949747468305833 2023-11-28 13:26:50,285 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.727e+01 9.422e+01 1.009e+02 1.518e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 13:26:50,403 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528550 2023-11-28 13:26:57,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3523640.0, ans=0.2 2023-11-28 13:27:14,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3523773.3333333335, ans=0.125 2023-11-28 13:27:17,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3523773.3333333335, ans=0.125 2023-11-28 13:27:21,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3523840.0, ans=0.1 2023-11-28 13:27:22,071 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11550, loss[loss=0.05881, simple_loss=0.07942, pruned_loss=0.008253, audio_tagging_loss=0.01084, over 15166.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08894, pruned_loss=0.0121, audio_tagging_loss=0.008588, over 3054405.81 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 8.0 2023-11-28 13:27:27,038 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.41 vs. limit=15.0 2023-11-28 13:27:45,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3523973.3333333335, ans=0.125 2023-11-28 13:27:47,986 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528600 2023-11-28 13:28:03,238 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:28:21,093 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11600, loss[loss=0.05682, simple_loss=0.07857, pruned_loss=0.01041, audio_tagging_loss=0.007124, over 15633.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09013, pruned_loss=0.01227, audio_tagging_loss=0.008565, over 3052408.04 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:28:21,809 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=15.0 2023-11-28 13:28:30,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3524173.3333333335, ans=0.0 2023-11-28 13:28:37,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3524240.0, ans=0.1 2023-11-28 13:28:45,925 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 9.016e+01 9.615e+01 1.039e+02 1.434e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-28 13:28:46,029 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528650 2023-11-28 13:28:46,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3524306.6666666665, ans=0.125 2023-11-28 13:28:59,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3524373.3333333335, ans=6.0 2023-11-28 13:29:08,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3524440.0, ans=0.125 2023-11-28 13:29:15,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3524440.0, ans=0.0 2023-11-28 13:29:18,257 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11650, loss[loss=0.0721, simple_loss=0.09479, pruned_loss=0.01601, audio_tagging_loss=0.008697, over 15004.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08976, pruned_loss=0.01212, audio_tagging_loss=0.008588, over 3052254.97 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:29:20,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3524506.6666666665, ans=0.1 2023-11-28 13:29:30,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3524573.3333333335, ans=0.0 2023-11-28 13:29:33,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3524573.3333333335, ans=0.125 2023-11-28 13:29:38,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3524573.3333333335, ans=0.125 2023-11-28 13:29:43,069 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528700 2023-11-28 13:29:47,918 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.20 vs. limit=12.0 2023-11-28 13:29:54,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3524706.6666666665, ans=0.0 2023-11-28 13:29:54,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3524706.6666666665, ans=0.1 2023-11-28 13:30:15,776 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11700, loss[loss=0.05277, simple_loss=0.07476, pruned_loss=0.007194, audio_tagging_loss=0.008199, over 14156.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.0895, pruned_loss=0.01206, audio_tagging_loss=0.008682, over 3052740.37 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:30:41,690 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 8.706e+01 9.225e+01 1.001e+02 1.364e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-28 13:30:41,792 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528750 2023-11-28 13:30:42,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3524973.3333333335, ans=0.125 2023-11-28 13:30:48,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3524973.3333333335, ans=0.0 2023-11-28 13:31:02,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3525106.6666666665, ans=0.0 2023-11-28 13:31:12,750 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11750, loss[loss=0.05093, simple_loss=0.05803, pruned_loss=0.008282, audio_tagging_loss=0.01363, over 14626.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.0886, pruned_loss=0.01184, audio_tagging_loss=0.008768, over 3049285.16 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:31:13,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3525173.3333333335, ans=0.1 2023-11-28 13:31:19,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3525173.3333333335, ans=0.0 2023-11-28 13:31:20,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3525173.3333333335, ans=0.2 2023-11-28 13:31:27,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3525240.0, ans=0.2 2023-11-28 13:31:38,567 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528800 2023-11-28 13:31:58,851 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:32:11,775 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11800, loss[loss=0.06904, simple_loss=0.09877, pruned_loss=0.009193, audio_tagging_loss=0.01047, over 14937.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08951, pruned_loss=0.01204, audio_tagging_loss=0.008825, over 3043914.20 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:32:34,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.15 vs. limit=12.0 2023-11-28 13:32:35,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3525640.0, ans=0.125 2023-11-28 13:32:36,406 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.895e+01 9.052e+01 9.553e+01 1.035e+02 2.670e+02, threshold=1.911e+02, percent-clipped=1.0 2023-11-28 13:32:36,512 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528850 2023-11-28 13:32:44,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3525640.0, ans=0.125 2023-11-28 13:32:56,044 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=15.0 2023-11-28 13:33:02,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3525773.3333333335, ans=0.125 2023-11-28 13:33:09,418 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11850, loss[loss=0.06017, simple_loss=0.07972, pruned_loss=0.01104, audio_tagging_loss=0.009272, over 15971.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08966, pruned_loss=0.01199, audio_tagging_loss=0.008902, over 3053853.81 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:33:20,966 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=12.0 2023-11-28 13:33:35,065 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528900 2023-11-28 13:33:48,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3526040.0, ans=0.1 2023-11-28 13:33:48,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3526040.0, ans=0.125 2023-11-28 13:33:59,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3526106.6666666665, ans=0.0 2023-11-28 13:34:02,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3526106.6666666665, ans=0.05 2023-11-28 13:34:06,364 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11900, loss[loss=0.04231, simple_loss=0.05331, pruned_loss=0.006558, audio_tagging_loss=0.009095, over 15167.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08953, pruned_loss=0.01188, audio_tagging_loss=0.008933, over 3053464.73 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:34:16,627 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=22.5 2023-11-28 13:34:31,997 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 8.911e+01 9.791e+01 1.051e+02 1.188e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-28 13:34:32,110 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528950 2023-11-28 13:34:32,667 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.34 vs. limit=15.0 2023-11-28 13:34:39,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3526306.6666666665, ans=0.07 2023-11-28 13:34:44,112 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=15.0 2023-11-28 13:34:49,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3526373.3333333335, ans=0.125 2023-11-28 13:35:05,015 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11950, loss[loss=0.06634, simple_loss=0.08404, pruned_loss=0.01127, audio_tagging_loss=0.01305, over 14906.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08931, pruned_loss=0.012, audio_tagging_loss=0.008922, over 3058038.93 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:35:23,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3526573.3333333335, ans=0.0 2023-11-28 13:35:29,908 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529000 2023-11-28 13:35:31,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3526640.0, ans=0.0 2023-11-28 13:35:31,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3526640.0, ans=0.0 2023-11-28 13:35:37,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3526640.0, ans=0.2 2023-11-28 13:35:44,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3526706.6666666665, ans=0.2 2023-11-28 13:35:57,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3526773.3333333335, ans=0.95 2023-11-28 13:36:02,071 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 12000, loss[loss=0.05244, simple_loss=0.07082, pruned_loss=0.008271, audio_tagging_loss=0.008757, over 15300.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08921, pruned_loss=0.01203, audio_tagging_loss=0.009028, over 3054868.53 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:36:02,073 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 13:36:37,276 INFO [train_asr.py:1267] (0/4) Epoch 44, validation: loss=0.05811, simple_loss=0.05058, pruned_loss=0.005337, audio_tagging_loss=0.02748, over 4681554.00 frames. 2023-11-28 13:36:37,277 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 13:36:37,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.11 vs. limit=15.0 2023-11-28 13:36:38,984 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=15.0 2023-11-28 13:36:47,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3526906.6666666665, ans=0.1 2023-11-28 13:36:51,482 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.61 vs. limit=6.0 2023-11-28 13:36:53,271 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:37:00,785 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529050 2023-11-28 13:37:01,757 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.922e+01 8.972e+01 9.530e+01 1.015e+02 1.256e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 13:37:05,771 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-44.pt 2023-11-28 13:37:21,568 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 0, loss[loss=0.07515, simple_loss=0.09149, pruned_loss=0.008504, audio_tagging_loss=0.0209, over 15067.00 frames. ], tot_loss[loss=0.07515, simple_loss=0.09149, pruned_loss=0.008504, audio_tagging_loss=0.0209, over 15067.00 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 13:37:21,570 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 13:37:56,020 INFO [train_asr.py:1267] (0/4) Epoch 45, validation: loss=0.05764, simple_loss=0.05062, pruned_loss=0.005372, audio_tagging_loss=0.02696, over 4681554.00 frames. 2023-11-28 13:37:56,020 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 13:37:58,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.22 vs. limit=15.0 2023-11-28 13:38:03,725 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.06 vs. limit=22.5 2023-11-28 13:38:17,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3527080.0, ans=10.0 2023-11-28 13:38:20,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3527146.6666666665, ans=0.125 2023-11-28 13:38:21,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3527146.6666666665, ans=0.125 2023-11-28 13:38:23,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3527146.6666666665, ans=0.025 2023-11-28 13:38:45,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3527280.0, ans=0.125 2023-11-28 13:38:48,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3527280.0, ans=0.125 2023-11-28 13:38:49,241 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529100 2023-11-28 13:38:53,507 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 50, loss[loss=0.07401, simple_loss=0.08345, pruned_loss=0.01492, audio_tagging_loss=0.01736, over 14826.00 frames. ], tot_loss[loss=0.07283, simple_loss=0.08715, pruned_loss=0.01215, audio_tagging_loss=0.0171, over 689851.86 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:38:57,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3527346.6666666665, ans=0.0 2023-11-28 13:39:06,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3527413.3333333335, ans=0.125 2023-11-28 13:39:20,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3527480.0, ans=0.125 2023-11-28 13:39:47,095 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529150 2023-11-28 13:39:49,231 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.341e+01 9.943e+01 1.065e+02 1.140e+02 1.453e+02, threshold=2.129e+02, percent-clipped=0.0 2023-11-28 13:39:51,484 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 100, loss[loss=0.0858, simple_loss=0.1193, pruned_loss=0.01418, audio_tagging_loss=0.01196, over 16068.00 frames. ], tot_loss[loss=0.07212, simple_loss=0.08831, pruned_loss=0.01199, audio_tagging_loss=0.01597, over 1209432.08 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:39:52,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3527680.0, ans=0.0 2023-11-28 13:40:02,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3527746.6666666665, ans=0.125 2023-11-28 13:40:11,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3527746.6666666665, ans=0.125 2023-11-28 13:40:11,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3527746.6666666665, ans=0.125 2023-11-28 13:40:11,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3527746.6666666665, ans=0.125 2023-11-28 13:40:24,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3527813.3333333335, ans=0.1 2023-11-28 13:40:25,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3527880.0, ans=0.125 2023-11-28 13:40:41,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3527946.6666666665, ans=0.125 2023-11-28 13:40:45,454 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529200 2023-11-28 13:40:50,293 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 150, loss[loss=0.06535, simple_loss=0.09039, pruned_loss=0.006892, audio_tagging_loss=0.01326, over 16352.00 frames. ], tot_loss[loss=0.07125, simple_loss=0.08948, pruned_loss=0.01211, audio_tagging_loss=0.0144, over 1613953.00 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:40:51,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3528013.3333333335, ans=10.0 2023-11-28 13:41:18,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3528146.6666666665, ans=0.125 2023-11-28 13:41:19,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3528146.6666666665, ans=0.125 2023-11-28 13:41:30,206 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:41:37,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3528280.0, ans=0.1 2023-11-28 13:41:43,797 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529250 2023-11-28 13:41:46,530 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 8.992e+01 9.963e+01 1.064e+02 1.457e+02, threshold=1.993e+02, percent-clipped=0.0 2023-11-28 13:41:48,764 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 200, loss[loss=0.07152, simple_loss=0.09979, pruned_loss=0.01424, audio_tagging_loss=0.007386, over 15417.00 frames. ], tot_loss[loss=0.07009, simple_loss=0.09027, pruned_loss=0.01219, audio_tagging_loss=0.01276, over 1929449.85 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:41:53,913 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2023-11-28 13:41:59,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3528413.3333333335, ans=0.125 2023-11-28 13:41:59,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3528413.3333333335, ans=0.125 2023-11-28 13:42:09,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3528413.3333333335, ans=0.0 2023-11-28 13:42:21,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3528480.0, ans=0.0 2023-11-28 13:42:26,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3528546.6666666665, ans=0.0 2023-11-28 13:42:38,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3528613.3333333335, ans=0.125 2023-11-28 13:42:39,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3528613.3333333335, ans=0.125 2023-11-28 13:42:40,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3528613.3333333335, ans=0.125 2023-11-28 13:42:41,619 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529300 2023-11-28 13:42:46,540 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 250, loss[loss=0.06446, simple_loss=0.09179, pruned_loss=0.01044, audio_tagging_loss=0.008125, over 15475.00 frames. ], tot_loss[loss=0.06938, simple_loss=0.09125, pruned_loss=0.01227, audio_tagging_loss=0.01148, over 2171355.47 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:42:47,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3528680.0, ans=0.125 2023-11-28 13:42:51,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3528680.0, ans=0.1 2023-11-28 13:42:54,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3528680.0, ans=0.125 2023-11-28 13:42:55,912 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=15.0 2023-11-28 13:43:02,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3528746.6666666665, ans=0.035 2023-11-28 13:43:17,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3528813.3333333335, ans=0.125 2023-11-28 13:43:18,488 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.61 vs. limit=15.0 2023-11-28 13:43:30,485 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.71 vs. limit=8.0 2023-11-28 13:43:34,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3528946.6666666665, ans=0.2 2023-11-28 13:43:39,493 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529350 2023-11-28 13:43:42,053 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.940e+01 8.918e+01 9.810e+01 1.066e+02 1.328e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-28 13:43:44,757 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 300, loss[loss=0.05447, simple_loss=0.07154, pruned_loss=0.008808, audio_tagging_loss=0.009889, over 15821.00 frames. ], tot_loss[loss=0.06872, simple_loss=0.09113, pruned_loss=0.01252, audio_tagging_loss=0.01064, over 2365696.04 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:43:52,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3529013.3333333335, ans=0.125 2023-11-28 13:43:53,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3529013.3333333335, ans=0.125 2023-11-28 13:43:55,944 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:43:57,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3529080.0, ans=0.0 2023-11-28 13:44:29,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3529213.3333333335, ans=0.125 2023-11-28 13:44:31,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3529280.0, ans=0.125 2023-11-28 13:44:33,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3529280.0, ans=0.1 2023-11-28 13:44:37,720 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529400 2023-11-28 13:44:42,414 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 350, loss[loss=0.08105, simple_loss=0.1067, pruned_loss=0.02082, audio_tagging_loss=0.006866, over 14744.00 frames. ], tot_loss[loss=0.06852, simple_loss=0.09175, pruned_loss=0.01265, audio_tagging_loss=0.009997, over 2522749.30 frames. ], batch size: 52, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:44:54,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3529413.3333333335, ans=0.125 2023-11-28 13:45:03,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3529413.3333333335, ans=0.125 2023-11-28 13:45:04,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3529480.0, ans=0.0 2023-11-28 13:45:08,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3529480.0, ans=0.125 2023-11-28 13:45:13,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3529480.0, ans=0.125 2023-11-28 13:45:25,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.75 vs. limit=22.5 2023-11-28 13:45:35,906 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529450 2023-11-28 13:45:38,646 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.965e+01 9.086e+01 9.699e+01 1.038e+02 1.395e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-28 13:45:40,912 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 400, loss[loss=0.06914, simple_loss=0.09216, pruned_loss=0.01338, audio_tagging_loss=0.009683, over 15113.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09086, pruned_loss=0.01235, audio_tagging_loss=0.009585, over 2641316.16 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 13:45:45,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3529680.0, ans=0.0 2023-11-28 13:46:00,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3529746.6666666665, ans=0.1 2023-11-28 13:46:34,071 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529500 2023-11-28 13:46:36,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3529946.6666666665, ans=0.125 2023-11-28 13:46:38,955 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 450, loss[loss=0.05562, simple_loss=0.07083, pruned_loss=0.01071, audio_tagging_loss=0.009495, over 15049.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09047, pruned_loss=0.01218, audio_tagging_loss=0.009327, over 2730787.65 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:47:09,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3530146.6666666665, ans=0.125 2023-11-28 13:47:32,237 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529550 2023-11-28 13:47:35,461 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.821e+01 9.442e+01 9.964e+01 1.327e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 13:47:36,657 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 500, loss[loss=0.07499, simple_loss=0.1024, pruned_loss=0.01505, audio_tagging_loss=0.008748, over 14957.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09005, pruned_loss=0.01195, audio_tagging_loss=0.00924, over 2800534.57 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:47:37,189 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=15.0 2023-11-28 13:47:57,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3530413.3333333335, ans=0.0 2023-11-28 13:48:13,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3530546.6666666665, ans=0.125 2023-11-28 13:48:25,277 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:48:28,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3530613.3333333335, ans=0.0 2023-11-28 13:48:29,505 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529600 2023-11-28 13:48:34,705 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 550, loss[loss=0.07413, simple_loss=0.1007, pruned_loss=0.01424, audio_tagging_loss=0.009551, over 15423.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09035, pruned_loss=0.01191, audio_tagging_loss=0.00905, over 2861059.59 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:48:50,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3530746.6666666665, ans=0.1 2023-11-28 13:48:58,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3530813.3333333335, ans=0.2 2023-11-28 13:49:12,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3530880.0, ans=0.0 2023-11-28 13:49:23,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3530946.6666666665, ans=0.0 2023-11-28 13:49:28,791 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529650 2023-11-28 13:49:31,977 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.823e+01 8.881e+01 9.298e+01 9.934e+01 2.506e+02, threshold=1.860e+02, percent-clipped=1.0 2023-11-28 13:49:33,539 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 600, loss[loss=0.05417, simple_loss=0.08098, pruned_loss=0.007349, audio_tagging_loss=0.006332, over 14768.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08924, pruned_loss=0.0118, audio_tagging_loss=0.009018, over 2896217.65 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:49:39,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3531013.3333333335, ans=0.0 2023-11-28 13:49:59,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3531146.6666666665, ans=0.125 2023-11-28 13:50:03,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3531146.6666666665, ans=0.1 2023-11-28 13:50:05,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3531146.6666666665, ans=0.125 2023-11-28 13:50:18,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3531213.3333333335, ans=0.2 2023-11-28 13:50:22,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3531280.0, ans=0.1 2023-11-28 13:50:27,225 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529700 2023-11-28 13:50:31,552 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 650, loss[loss=0.07751, simple_loss=0.1067, pruned_loss=0.0162, audio_tagging_loss=0.007951, over 14664.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08968, pruned_loss=0.0119, audio_tagging_loss=0.008913, over 2925935.35 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:50:33,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3531346.6666666665, ans=0.125 2023-11-28 13:50:37,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3531346.6666666665, ans=0.07 2023-11-28 13:50:38,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3531346.6666666665, ans=0.125 2023-11-28 13:50:57,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3531480.0, ans=0.0 2023-11-28 13:51:19,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3531613.3333333335, ans=0.1 2023-11-28 13:51:24,494 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529750 2023-11-28 13:51:25,711 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:51:26,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3531613.3333333335, ans=0.0 2023-11-28 13:51:26,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3531613.3333333335, ans=0.0 2023-11-28 13:51:27,665 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 9.012e+01 9.762e+01 1.029e+02 1.844e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-28 13:51:28,836 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 700, loss[loss=0.0656, simple_loss=0.08983, pruned_loss=0.01166, audio_tagging_loss=0.009031, over 15848.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08954, pruned_loss=0.01197, audio_tagging_loss=0.00889, over 2951553.08 frames. ], batch size: 61, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:51:31,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3531680.0, ans=0.2 2023-11-28 13:51:33,283 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.86 vs. limit=10.0 2023-11-28 13:51:34,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3531680.0, ans=0.125 2023-11-28 13:51:39,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3531680.0, ans=0.125 2023-11-28 13:52:11,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3531880.0, ans=6.0 2023-11-28 13:52:22,782 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529800 2023-11-28 13:52:28,101 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 750, loss[loss=0.08381, simple_loss=0.1127, pruned_loss=0.01767, audio_tagging_loss=0.009809, over 15014.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09043, pruned_loss=0.01227, audio_tagging_loss=0.008809, over 2968000.98 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:52:50,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3532146.6666666665, ans=0.1 2023-11-28 13:53:01,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3532213.3333333335, ans=0.0 2023-11-28 13:53:05,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3532213.3333333335, ans=0.0 2023-11-28 13:53:22,432 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529850 2023-11-28 13:53:25,750 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.782e+01 9.191e+01 9.653e+01 1.030e+02 1.250e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-28 13:53:26,933 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 800, loss[loss=0.05467, simple_loss=0.0722, pruned_loss=0.007413, audio_tagging_loss=0.01116, over 15094.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.0916, pruned_loss=0.01238, audio_tagging_loss=0.008834, over 2983165.55 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 13:53:27,432 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.33 vs. limit=12.0 2023-11-28 13:53:30,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3532346.6666666665, ans=0.0 2023-11-28 13:53:38,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3532413.3333333335, ans=0.2 2023-11-28 13:54:02,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3532546.6666666665, ans=10.0 2023-11-28 13:54:17,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3532613.3333333335, ans=0.2 2023-11-28 13:54:18,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3532613.3333333335, ans=0.0 2023-11-28 13:54:20,031 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529900 2023-11-28 13:54:24,379 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 850, loss[loss=0.09806, simple_loss=0.1217, pruned_loss=0.02746, audio_tagging_loss=0.009751, over 14895.00 frames. ], tot_loss[loss=0.067, simple_loss=0.0915, pruned_loss=0.01247, audio_tagging_loss=0.00878, over 2996379.47 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:54:30,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3532680.0, ans=0.1 2023-11-28 13:54:32,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3532680.0, ans=0.1 2023-11-28 13:54:36,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3532746.6666666665, ans=0.125 2023-11-28 13:54:42,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3532746.6666666665, ans=0.125 2023-11-28 13:54:45,024 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:54:46,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3532746.6666666665, ans=0.0 2023-11-28 13:55:17,968 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529950 2023-11-28 13:55:22,293 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.009e+01 8.821e+01 9.434e+01 1.007e+02 1.194e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 13:55:22,320 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 900, loss[loss=0.07679, simple_loss=0.1094, pruned_loss=0.01418, audio_tagging_loss=0.00789, over 14952.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.0912, pruned_loss=0.01241, audio_tagging_loss=0.008768, over 3004800.51 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:55:34,425 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.33 vs. limit=22.5 2023-11-28 13:55:52,235 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.65 vs. limit=15.0 2023-11-28 13:55:52,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3533146.6666666665, ans=0.0 2023-11-28 13:55:56,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3533213.3333333335, ans=0.0 2023-11-28 13:56:16,921 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530000 2023-11-28 13:56:21,528 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 950, loss[loss=0.07064, simple_loss=0.09355, pruned_loss=0.0152, audio_tagging_loss=0.008671, over 14509.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09137, pruned_loss=0.01231, audio_tagging_loss=0.008723, over 3016032.70 frames. ], batch size: 52, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:56:34,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3533413.3333333335, ans=0.125 2023-11-28 13:56:37,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3533413.3333333335, ans=0.1 2023-11-28 13:56:37,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3533413.3333333335, ans=0.125 2023-11-28 13:56:40,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3533413.3333333335, ans=0.125 2023-11-28 13:56:42,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3533413.3333333335, ans=0.125 2023-11-28 13:56:52,828 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.02 vs. limit=22.5 2023-11-28 13:57:14,561 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530050 2023-11-28 13:57:18,395 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.41 vs. limit=15.0 2023-11-28 13:57:18,900 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.694e+01 8.954e+01 9.513e+01 1.020e+02 1.278e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 13:57:18,939 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1000, loss[loss=0.06638, simple_loss=0.08456, pruned_loss=0.01452, audio_tagging_loss=0.009581, over 15843.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09124, pruned_loss=0.01237, audio_tagging_loss=0.008592, over 3024841.35 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:57:29,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3533746.6666666665, ans=0.125 2023-11-28 13:57:45,906 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:57:48,598 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.43 vs. limit=10.0 2023-11-28 13:57:51,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3533813.3333333335, ans=0.1 2023-11-28 13:57:51,511 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:58:11,628 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530100 2023-11-28 13:58:15,919 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1050, loss[loss=0.04763, simple_loss=0.06063, pruned_loss=0.005264, audio_tagging_loss=0.01205, over 14944.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09102, pruned_loss=0.0123, audio_tagging_loss=0.008556, over 3027003.39 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:58:19,959 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.67 vs. limit=10.0 2023-11-28 13:58:24,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3534013.3333333335, ans=0.0 2023-11-28 13:58:37,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3534080.0, ans=0.125 2023-11-28 13:58:57,317 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.50 vs. limit=12.0 2023-11-28 13:58:58,344 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.99 vs. limit=15.0 2023-11-28 13:59:00,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3534213.3333333335, ans=0.0 2023-11-28 13:59:08,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3534280.0, ans=0.125 2023-11-28 13:59:09,411 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530150 2023-11-28 13:59:09,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3534280.0, ans=0.125 2023-11-28 13:59:14,281 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.910e+01 9.787e+01 1.025e+02 1.500e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-28 13:59:14,307 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1100, loss[loss=0.06264, simple_loss=0.09419, pruned_loss=0.007663, audio_tagging_loss=0.007884, over 15595.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08983, pruned_loss=0.0122, audio_tagging_loss=0.008541, over 3026755.74 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:59:15,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3534346.6666666665, ans=0.125 2023-11-28 13:59:17,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3534346.6666666665, ans=0.0 2023-11-28 13:59:19,362 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:59:19,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3534346.6666666665, ans=0.0 2023-11-28 13:59:19,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3534346.6666666665, ans=0.1 2023-11-28 13:59:24,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3534346.6666666665, ans=0.125 2023-11-28 13:59:27,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3534413.3333333335, ans=0.125 2023-11-28 13:59:43,801 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2023-11-28 13:59:45,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3534480.0, ans=0.125 2023-11-28 13:59:52,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3534546.6666666665, ans=0.1 2023-11-28 13:59:58,070 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.72 vs. limit=15.0 2023-11-28 13:59:59,390 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.92 vs. limit=22.5 2023-11-28 14:00:06,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3534613.3333333335, ans=0.125 2023-11-28 14:00:08,095 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530200 2023-11-28 14:00:12,827 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1150, loss[loss=0.0668, simple_loss=0.09027, pruned_loss=0.01281, audio_tagging_loss=0.008853, over 15223.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09053, pruned_loss=0.01231, audio_tagging_loss=0.008512, over 3030763.56 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:00:21,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3534680.0, ans=0.0 2023-11-28 14:00:29,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3534746.6666666665, ans=0.125 2023-11-28 14:00:32,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3534746.6666666665, ans=0.0 2023-11-28 14:00:47,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3534880.0, ans=0.1 2023-11-28 14:00:51,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3534880.0, ans=0.0 2023-11-28 14:01:06,349 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530250 2023-11-28 14:01:10,691 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 9.000e+01 9.554e+01 1.019e+02 1.286e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 14:01:10,730 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1200, loss[loss=0.08263, simple_loss=0.1035, pruned_loss=0.01982, audio_tagging_loss=0.01105, over 15164.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09129, pruned_loss=0.01258, audio_tagging_loss=0.008431, over 3025567.09 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 14:01:12,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3535013.3333333335, ans=0.125 2023-11-28 14:01:17,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3535013.3333333335, ans=0.125 2023-11-28 14:01:21,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3535080.0, ans=0.2 2023-11-28 14:01:29,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3535080.0, ans=0.035 2023-11-28 14:01:56,581 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=15.0 2023-11-28 14:02:04,439 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530300 2023-11-28 14:02:09,381 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1250, loss[loss=0.06999, simple_loss=0.08563, pruned_loss=0.01771, audio_tagging_loss=0.00946, over 14772.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09077, pruned_loss=0.01243, audio_tagging_loss=0.008467, over 3027993.37 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:02:22,646 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2023-11-28 14:02:26,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3535413.3333333335, ans=0.0 2023-11-28 14:02:31,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3535480.0, ans=0.2 2023-11-28 14:02:54,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3535613.3333333335, ans=0.125 2023-11-28 14:02:58,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3535613.3333333335, ans=0.0 2023-11-28 14:02:59,261 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.67 vs. limit=15.0 2023-11-28 14:03:01,486 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.12 vs. limit=15.0 2023-11-28 14:03:02,157 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530350 2023-11-28 14:03:03,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3535613.3333333335, ans=0.2 2023-11-28 14:03:05,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.30 vs. limit=15.0 2023-11-28 14:03:07,353 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1300, loss[loss=0.06871, simple_loss=0.0927, pruned_loss=0.01297, audio_tagging_loss=0.009391, over 14359.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09038, pruned_loss=0.0123, audio_tagging_loss=0.008493, over 3022946.23 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:03:08,412 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 8.419e+01 9.205e+01 1.020e+02 1.250e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-28 14:03:35,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3535813.3333333335, ans=0.025 2023-11-28 14:03:51,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3535880.0, ans=0.0 2023-11-28 14:04:01,219 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530400 2023-11-28 14:04:01,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3535946.6666666665, ans=0.125 2023-11-28 14:04:05,971 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1350, loss[loss=0.06832, simple_loss=0.08343, pruned_loss=0.01389, audio_tagging_loss=0.01272, over 15491.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09002, pruned_loss=0.01237, audio_tagging_loss=0.008509, over 3029409.37 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 8.0 2023-11-28 14:04:20,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2023-11-28 14:04:27,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3536080.0, ans=0.0 2023-11-28 14:04:29,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3536146.6666666665, ans=0.125 2023-11-28 14:04:50,256 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:04:59,527 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530450 2023-11-28 14:05:03,922 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1400, loss[loss=0.07485, simple_loss=0.0967, pruned_loss=0.01842, audio_tagging_loss=0.008082, over 14343.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08991, pruned_loss=0.01233, audio_tagging_loss=0.00863, over 3041988.46 frames. ], batch size: 53, lr: 1.51e-03, grad_scale: 8.0 2023-11-28 14:05:06,573 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.943e+01 9.366e+01 9.966e+01 1.345e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 14:05:26,617 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=15.0 2023-11-28 14:05:42,965 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.39 vs. limit=15.0 2023-11-28 14:05:47,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3536546.6666666665, ans=0.0 2023-11-28 14:05:51,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3536613.3333333335, ans=0.125 2023-11-28 14:05:54,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3536613.3333333335, ans=0.04949747468305833 2023-11-28 14:05:56,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3536613.3333333335, ans=0.125 2023-11-28 14:05:57,466 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530500 2023-11-28 14:06:01,796 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1450, loss[loss=0.06646, simple_loss=0.09465, pruned_loss=0.0126, audio_tagging_loss=0.006526, over 15301.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09129, pruned_loss=0.01246, audio_tagging_loss=0.008704, over 3049854.93 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 8.0 2023-11-28 14:06:02,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3536680.0, ans=0.0 2023-11-28 14:06:07,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3536680.0, ans=0.0 2023-11-28 14:06:10,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3536680.0, ans=0.1 2023-11-28 14:06:11,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3536680.0, ans=0.125 2023-11-28 14:06:23,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3536746.6666666665, ans=0.0 2023-11-28 14:06:24,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3536813.3333333335, ans=0.125 2023-11-28 14:06:28,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3536813.3333333335, ans=0.125 2023-11-28 14:06:55,251 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530550 2023-11-28 14:07:00,269 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1500, loss[loss=0.04515, simple_loss=0.05671, pruned_loss=0.008015, audio_tagging_loss=0.008774, over 14751.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09062, pruned_loss=0.0123, audio_tagging_loss=0.008796, over 3051173.89 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 8.0 2023-11-28 14:07:02,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.37 vs. limit=10.0 2023-11-28 14:07:02,463 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 9.037e+01 9.664e+01 1.030e+02 1.385e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-28 14:07:21,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3537080.0, ans=0.2 2023-11-28 14:07:27,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3537146.6666666665, ans=0.5 2023-11-28 14:07:39,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3537213.3333333335, ans=0.0 2023-11-28 14:07:43,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3537213.3333333335, ans=0.125 2023-11-28 14:07:43,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3537213.3333333335, ans=0.09899494936611666 2023-11-28 14:07:53,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3537280.0, ans=10.0 2023-11-28 14:07:53,583 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530600 2023-11-28 14:07:55,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3537280.0, ans=0.07 2023-11-28 14:07:58,711 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1550, loss[loss=0.05866, simple_loss=0.08031, pruned_loss=0.01057, audio_tagging_loss=0.007937, over 14264.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09098, pruned_loss=0.01224, audio_tagging_loss=0.008804, over 3059520.85 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 8.0 2023-11-28 14:08:00,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3537346.6666666665, ans=10.0 2023-11-28 14:08:37,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3537546.6666666665, ans=0.0 2023-11-28 14:08:46,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3537613.3333333335, ans=0.125 2023-11-28 14:08:51,558 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530650 2023-11-28 14:08:52,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3537613.3333333335, ans=0.0 2023-11-28 14:08:55,985 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1600, loss[loss=0.08646, simple_loss=0.1153, pruned_loss=0.01872, audio_tagging_loss=0.01008, over 14817.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.0916, pruned_loss=0.01238, audio_tagging_loss=0.008798, over 3065350.14 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:08:58,167 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.016e+01 9.104e+01 9.583e+01 1.052e+02 1.503e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 14:08:59,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3537680.0, ans=0.125 2023-11-28 14:09:02,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3537680.0, ans=0.0 2023-11-28 14:09:09,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3537746.6666666665, ans=0.125 2023-11-28 14:09:24,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3537813.3333333335, ans=0.125 2023-11-28 14:09:48,757 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530700 2023-11-28 14:09:53,813 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1650, loss[loss=0.07176, simple_loss=0.1077, pruned_loss=0.01122, audio_tagging_loss=0.006681, over 15053.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09092, pruned_loss=0.01223, audio_tagging_loss=0.008901, over 3068697.41 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:10:16,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3538146.6666666665, ans=0.125 2023-11-28 14:10:31,888 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.37 vs. limit=15.0 2023-11-28 14:10:33,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3538213.3333333335, ans=0.0 2023-11-28 14:10:47,189 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530750 2023-11-28 14:10:51,983 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1700, loss[loss=0.06383, simple_loss=0.08738, pruned_loss=0.01193, audio_tagging_loss=0.00821, over 16128.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09065, pruned_loss=0.01239, audio_tagging_loss=0.008922, over 3065408.98 frames. ], batch size: 62, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:10:54,244 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.956e+01 9.565e+01 1.008e+02 1.733e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-28 14:11:11,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3538413.3333333335, ans=0.125 2023-11-28 14:11:29,322 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:11:30,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3538546.6666666665, ans=0.2 2023-11-28 14:11:45,481 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530800 2023-11-28 14:11:50,091 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1750, loss[loss=0.05945, simple_loss=0.09044, pruned_loss=0.008537, audio_tagging_loss=0.005691, over 14322.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09064, pruned_loss=0.0123, audio_tagging_loss=0.008893, over 3068323.07 frames. ], batch size: 53, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:11:52,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2023-11-28 14:12:00,650 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.55 vs. limit=22.5 2023-11-28 14:12:18,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3538813.3333333335, ans=0.125 2023-11-28 14:12:19,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3538813.3333333335, ans=0.035 2023-11-28 14:12:23,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3538880.0, ans=0.1 2023-11-28 14:12:25,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3538880.0, ans=0.125 2023-11-28 14:12:31,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3538880.0, ans=0.125 2023-11-28 14:12:36,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3538946.6666666665, ans=0.0 2023-11-28 14:12:43,156 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530850 2023-11-28 14:12:47,405 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1800, loss[loss=0.06079, simple_loss=0.08726, pruned_loss=0.01095, audio_tagging_loss=0.00621, over 15136.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09072, pruned_loss=0.01235, audio_tagging_loss=0.00874, over 3055052.49 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:12:50,241 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.905e+01 9.378e+01 9.880e+01 1.265e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 14:13:07,340 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.03 vs. limit=10.0 2023-11-28 14:13:07,352 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2023-11-28 14:13:09,896 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=15.0 2023-11-28 14:13:17,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3539146.6666666665, ans=0.125 2023-11-28 14:13:20,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3539146.6666666665, ans=0.0 2023-11-28 14:13:40,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3539280.0, ans=0.1 2023-11-28 14:13:41,505 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530900 2023-11-28 14:13:46,525 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1850, loss[loss=0.07797, simple_loss=0.1017, pruned_loss=0.01711, audio_tagging_loss=0.009987, over 16103.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09037, pruned_loss=0.01222, audio_tagging_loss=0.008731, over 3056030.91 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:14:15,895 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.33 vs. limit=22.5 2023-11-28 14:14:22,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3539546.6666666665, ans=0.2 2023-11-28 14:14:22,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3539546.6666666665, ans=0.125 2023-11-28 14:14:40,760 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530950 2023-11-28 14:14:42,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3539613.3333333335, ans=0.0 2023-11-28 14:14:45,118 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1900, loss[loss=0.04979, simple_loss=0.06081, pruned_loss=0.009128, audio_tagging_loss=0.01026, over 14650.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09063, pruned_loss=0.01224, audio_tagging_loss=0.008691, over 3053484.13 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:14:47,347 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 8.623e+01 9.343e+01 1.003e+02 1.342e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 14:14:49,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3539680.0, ans=0.125 2023-11-28 14:15:12,719 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=15.0 2023-11-28 14:15:14,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3539813.3333333335, ans=0.125 2023-11-28 14:15:16,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3539813.3333333335, ans=0.0 2023-11-28 14:15:38,324 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531000 2023-11-28 14:15:43,047 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1950, loss[loss=0.07367, simple_loss=0.1006, pruned_loss=0.01664, audio_tagging_loss=0.006708, over 14743.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09024, pruned_loss=0.01232, audio_tagging_loss=0.008552, over 3048914.12 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:15:49,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3540013.3333333335, ans=0.04949747468305833 2023-11-28 14:15:52,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3540013.3333333335, ans=0.125 2023-11-28 14:16:35,989 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531050 2023-11-28 14:16:40,329 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2000, loss[loss=0.07711, simple_loss=0.1082, pruned_loss=0.01481, audio_tagging_loss=0.008178, over 15328.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08894, pruned_loss=0.01223, audio_tagging_loss=0.008595, over 3047867.19 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 14:16:42,527 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.094e+01 8.874e+01 9.480e+01 1.027e+02 1.449e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 14:17:07,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3540480.0, ans=0.125 2023-11-28 14:17:17,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3540546.6666666665, ans=0.0 2023-11-28 14:17:34,667 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531100 2023-11-28 14:17:37,263 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2023-11-28 14:17:39,033 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2050, loss[loss=0.08695, simple_loss=0.1168, pruned_loss=0.02021, audio_tagging_loss=0.008353, over 16973.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08873, pruned_loss=0.01214, audio_tagging_loss=0.008637, over 3051538.50 frames. ], batch size: 61, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:17:41,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3540680.0, ans=0.1 2023-11-28 14:18:00,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3540813.3333333335, ans=0.125 2023-11-28 14:18:01,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3540813.3333333335, ans=0.125 2023-11-28 14:18:20,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3540880.0, ans=0.125 2023-11-28 14:18:31,773 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531150 2023-11-28 14:18:33,418 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.81 vs. limit=22.5 2023-11-28 14:18:36,101 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2100, loss[loss=0.06929, simple_loss=0.1012, pruned_loss=0.01257, audio_tagging_loss=0.006113, over 15132.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08959, pruned_loss=0.01226, audio_tagging_loss=0.008572, over 3054193.50 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:18:39,403 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 8.857e+01 9.324e+01 1.026e+02 1.303e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 14:19:18,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3541213.3333333335, ans=0.0 2023-11-28 14:19:23,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3541280.0, ans=0.0 2023-11-28 14:19:27,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3541280.0, ans=0.125 2023-11-28 14:19:29,739 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531200 2023-11-28 14:19:34,447 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2150, loss[loss=0.06836, simple_loss=0.08792, pruned_loss=0.01481, audio_tagging_loss=0.009592, over 16166.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08943, pruned_loss=0.01221, audio_tagging_loss=0.008628, over 3057663.23 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:19:41,970 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.71 vs. limit=15.0 2023-11-28 14:19:49,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3541413.3333333335, ans=0.125 2023-11-28 14:19:56,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3541413.3333333335, ans=0.0 2023-11-28 14:19:57,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3541480.0, ans=0.2 2023-11-28 14:19:58,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3541480.0, ans=0.0 2023-11-28 14:20:02,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3541480.0, ans=0.125 2023-11-28 14:20:05,512 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2023-11-28 14:20:12,122 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:20:14,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3541546.6666666665, ans=0.0 2023-11-28 14:20:14,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3541546.6666666665, ans=0.05 2023-11-28 14:20:24,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3541613.3333333335, ans=0.1 2023-11-28 14:20:27,427 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2023-11-28 14:20:28,021 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531250 2023-11-28 14:20:29,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3541613.3333333335, ans=0.05 2023-11-28 14:20:32,780 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2200, loss[loss=0.06672, simple_loss=0.09547, pruned_loss=0.01005, audio_tagging_loss=0.008941, over 15285.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08945, pruned_loss=0.01198, audio_tagging_loss=0.00866, over 3050714.30 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:20:35,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3541680.0, ans=0.1 2023-11-28 14:20:36,789 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.900e+01 9.511e+01 1.009e+02 1.221e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 14:21:20,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3541946.6666666665, ans=0.0 2023-11-28 14:21:26,738 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531300 2023-11-28 14:21:29,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3541946.6666666665, ans=0.0 2023-11-28 14:21:30,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3542013.3333333335, ans=0.0 2023-11-28 14:21:31,077 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2250, loss[loss=0.05388, simple_loss=0.07237, pruned_loss=0.007339, audio_tagging_loss=0.01036, over 15942.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.0888, pruned_loss=0.01194, audio_tagging_loss=0.008696, over 3048869.60 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:21:38,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.21 vs. limit=22.5 2023-11-28 14:21:50,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.44 vs. limit=22.5 2023-11-28 14:21:50,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3542080.0, ans=0.125 2023-11-28 14:22:20,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3542280.0, ans=0.125 2023-11-28 14:22:24,241 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531350 2023-11-28 14:22:27,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3542346.6666666665, ans=0.125 2023-11-28 14:22:28,605 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2300, loss[loss=0.08011, simple_loss=0.106, pruned_loss=0.01817, audio_tagging_loss=0.008943, over 15241.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08899, pruned_loss=0.01202, audio_tagging_loss=0.008661, over 3047581.22 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:22:31,831 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.858e+01 9.166e+01 9.728e+01 1.033e+02 1.405e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-28 14:22:50,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3542413.3333333335, ans=0.0 2023-11-28 14:23:01,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3542480.0, ans=0.1 2023-11-28 14:23:07,332 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:23:20,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3542613.3333333335, ans=0.125 2023-11-28 14:23:21,497 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:23:21,562 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531400 2023-11-28 14:23:26,667 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2350, loss[loss=0.07117, simple_loss=0.1017, pruned_loss=0.01124, audio_tagging_loss=0.009073, over 14262.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08804, pruned_loss=0.01195, audio_tagging_loss=0.008815, over 3047293.37 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:23:32,279 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.35 vs. limit=15.0 2023-11-28 14:23:47,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3542746.6666666665, ans=0.07 2023-11-28 14:24:10,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3542880.0, ans=0.125 2023-11-28 14:24:11,646 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.15 vs. limit=10.0 2023-11-28 14:24:14,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3542946.6666666665, ans=0.125 2023-11-28 14:24:20,487 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531450 2023-11-28 14:24:20,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3542946.6666666665, ans=0.125 2023-11-28 14:24:24,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3543013.3333333335, ans=0.07 2023-11-28 14:24:25,490 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2400, loss[loss=0.08449, simple_loss=0.1249, pruned_loss=0.01627, audio_tagging_loss=0.005763, over 15660.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08817, pruned_loss=0.01202, audio_tagging_loss=0.008825, over 3043720.84 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 14:24:28,802 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.833e+01 8.802e+01 9.417e+01 9.979e+01 1.299e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 14:24:31,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3543013.3333333335, ans=0.125 2023-11-28 14:24:44,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3543080.0, ans=0.1 2023-11-28 14:25:15,703 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2023-11-28 14:25:18,430 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531500 2023-11-28 14:25:22,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3543346.6666666665, ans=0.0 2023-11-28 14:25:23,329 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2450, loss[loss=0.04693, simple_loss=0.06813, pruned_loss=0.003991, audio_tagging_loss=0.008879, over 15016.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08932, pruned_loss=0.01216, audio_tagging_loss=0.008845, over 3050853.73 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 14:25:58,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.40 vs. limit=5.0 2023-11-28 14:26:06,936 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.08 vs. limit=15.0 2023-11-28 14:26:16,407 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531550 2023-11-28 14:26:21,240 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2500, loss[loss=0.06197, simple_loss=0.08479, pruned_loss=0.009557, audio_tagging_loss=0.01002, over 15543.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08879, pruned_loss=0.0122, audio_tagging_loss=0.008896, over 3052808.22 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 14:26:21,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3543680.0, ans=0.125 2023-11-28 14:26:25,021 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 9.030e+01 9.693e+01 1.035e+02 1.388e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 14:26:25,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3543680.0, ans=0.125 2023-11-28 14:26:39,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3543746.6666666665, ans=0.125 2023-11-28 14:27:06,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3543946.6666666665, ans=0.125 2023-11-28 14:27:14,782 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531600 2023-11-28 14:27:17,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3543946.6666666665, ans=0.0 2023-11-28 14:27:19,439 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2550, loss[loss=0.06752, simple_loss=0.09908, pruned_loss=0.01045, audio_tagging_loss=0.007526, over 15147.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08887, pruned_loss=0.01213, audio_tagging_loss=0.008836, over 3053671.94 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:27:31,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3544080.0, ans=0.125 2023-11-28 14:27:45,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3544146.6666666665, ans=0.05 2023-11-28 14:28:06,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3544280.0, ans=0.2 2023-11-28 14:28:11,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3544280.0, ans=0.2 2023-11-28 14:28:13,592 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531650 2023-11-28 14:28:17,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3544346.6666666665, ans=0.0 2023-11-28 14:28:18,562 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2600, loss[loss=0.071, simple_loss=0.09973, pruned_loss=0.01152, audio_tagging_loss=0.009621, over 16069.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08956, pruned_loss=0.01222, audio_tagging_loss=0.008671, over 3051365.34 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 14:28:18,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3544346.6666666665, ans=0.04949747468305833 2023-11-28 14:28:24,066 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.904e+01 9.542e+01 1.021e+02 1.415e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 14:28:27,907 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.58 vs. limit=15.0 2023-11-28 14:28:39,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3544413.3333333335, ans=0.0 2023-11-28 14:28:44,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3544480.0, ans=0.0 2023-11-28 14:29:02,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3544546.6666666665, ans=0.1 2023-11-28 14:29:10,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3544613.3333333335, ans=0.0 2023-11-28 14:29:11,504 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531700 2023-11-28 14:29:16,004 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2650, loss[loss=0.06326, simple_loss=0.08441, pruned_loss=0.014, audio_tagging_loss=0.007049, over 14861.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.0896, pruned_loss=0.01225, audio_tagging_loss=0.008626, over 3049212.81 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 14:29:33,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3544746.6666666665, ans=0.125 2023-11-28 14:29:44,900 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2023-11-28 14:29:59,357 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:30:10,249 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531750 2023-11-28 14:30:11,933 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.76 vs. limit=22.5 2023-11-28 14:30:12,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3544946.6666666665, ans=0.0 2023-11-28 14:30:14,513 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2700, loss[loss=0.06547, simple_loss=0.09235, pruned_loss=0.01115, audio_tagging_loss=0.008143, over 15425.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08789, pruned_loss=0.01191, audio_tagging_loss=0.008621, over 3046994.28 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 14:30:19,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3545013.3333333335, ans=0.0 2023-11-28 14:30:19,912 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.994e+01 9.441e+01 1.024e+02 1.210e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 14:30:52,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3545213.3333333335, ans=0.125 2023-11-28 14:30:52,779 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.04 vs. limit=15.0 2023-11-28 14:31:04,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3545280.0, ans=0.1 2023-11-28 14:31:07,518 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531800 2023-11-28 14:31:12,501 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.88 vs. limit=15.0 2023-11-28 14:31:12,778 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2750, loss[loss=0.0606, simple_loss=0.08522, pruned_loss=0.01168, audio_tagging_loss=0.006312, over 15253.00 frames. ], tot_loss[loss=0.06424, simple_loss=0.08745, pruned_loss=0.0119, audio_tagging_loss=0.00861, over 3041557.45 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 14:31:35,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3545480.0, ans=0.05 2023-11-28 14:31:36,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3545480.0, ans=0.2 2023-11-28 14:31:50,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3545546.6666666665, ans=0.125 2023-11-28 14:31:52,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3545546.6666666665, ans=0.2 2023-11-28 14:32:01,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3545613.3333333335, ans=0.125 2023-11-28 14:32:01,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3545613.3333333335, ans=0.125 2023-11-28 14:32:05,264 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:32:06,454 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531850 2023-11-28 14:32:10,869 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2800, loss[loss=0.05697, simple_loss=0.08242, pruned_loss=0.009881, audio_tagging_loss=0.005875, over 15754.00 frames. ], tot_loss[loss=0.06418, simple_loss=0.08746, pruned_loss=0.01179, audio_tagging_loss=0.008664, over 3044264.69 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:32:11,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3545680.0, ans=0.125 2023-11-28 14:32:16,757 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.200e+01 8.903e+01 9.379e+01 1.017e+02 3.083e+02, threshold=1.876e+02, percent-clipped=1.0 2023-11-28 14:32:23,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3545746.6666666665, ans=0.125 2023-11-28 14:32:52,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3545880.0, ans=10.0 2023-11-28 14:33:02,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3545946.6666666665, ans=0.125 2023-11-28 14:33:04,490 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531900 2023-11-28 14:33:08,788 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2850, loss[loss=0.05146, simple_loss=0.07235, pruned_loss=0.006575, audio_tagging_loss=0.008709, over 14072.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.0882, pruned_loss=0.01195, audio_tagging_loss=0.008599, over 3044692.37 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:33:12,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3546013.3333333335, ans=0.2 2023-11-28 14:33:13,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3546013.3333333335, ans=0.0 2023-11-28 14:33:18,220 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.97 vs. limit=22.5 2023-11-28 14:33:56,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3546280.0, ans=0.0 2023-11-28 14:34:00,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3546280.0, ans=0.07 2023-11-28 14:34:01,845 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531950 2023-11-28 14:34:06,186 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2900, loss[loss=0.07453, simple_loss=0.09962, pruned_loss=0.0159, audio_tagging_loss=0.008816, over 16000.00 frames. ], tot_loss[loss=0.06421, simple_loss=0.08749, pruned_loss=0.01187, audio_tagging_loss=0.008591, over 3040739.39 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:34:07,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3546346.6666666665, ans=0.1 2023-11-28 14:34:12,363 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.591e+01 8.752e+01 9.369e+01 1.016e+02 1.365e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-28 14:35:00,606 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532000 2023-11-28 14:35:00,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3546613.3333333335, ans=0.0 2023-11-28 14:35:02,007 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-532000.pt 2023-11-28 14:35:07,386 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2950, loss[loss=0.06983, simple_loss=0.09355, pruned_loss=0.01597, audio_tagging_loss=0.007088, over 15176.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08853, pruned_loss=0.01197, audio_tagging_loss=0.008562, over 3046239.14 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:35:08,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3546680.0, ans=0.125 2023-11-28 14:35:10,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.56 vs. limit=6.0 2023-11-28 14:35:26,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3546746.6666666665, ans=0.0 2023-11-28 14:36:01,157 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532050 2023-11-28 14:36:06,000 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3000, loss[loss=0.0572, simple_loss=0.06773, pruned_loss=0.01159, audio_tagging_loss=0.01175, over 15969.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08941, pruned_loss=0.01201, audio_tagging_loss=0.008588, over 3045766.76 frames. ], batch size: 63, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:36:06,003 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 14:36:41,303 INFO [train_asr.py:1267] (0/4) Epoch 45, validation: loss=0.05774, simple_loss=0.05054, pruned_loss=0.005299, audio_tagging_loss=0.02717, over 4681554.00 frames. 2023-11-28 14:36:41,304 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 14:36:46,875 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.215e+01 8.901e+01 9.475e+01 1.021e+02 1.271e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 14:36:52,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3547080.0, ans=0.0 2023-11-28 14:36:57,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3547080.0, ans=0.125 2023-11-28 14:37:27,013 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.34 vs. limit=15.0 2023-11-28 14:37:35,064 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532100 2023-11-28 14:37:39,447 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3050, loss[loss=0.06152, simple_loss=0.09008, pruned_loss=0.009555, audio_tagging_loss=0.006925, over 13339.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08939, pruned_loss=0.01197, audio_tagging_loss=0.008724, over 3042463.37 frames. ], batch size: 51, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:37:47,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3547346.6666666665, ans=0.125 2023-11-28 14:37:57,604 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.90 vs. limit=22.5 2023-11-28 14:38:15,456 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:38:33,663 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532150 2023-11-28 14:38:38,088 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3100, loss[loss=0.05896, simple_loss=0.08451, pruned_loss=0.009869, audio_tagging_loss=0.006836, over 15093.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.09002, pruned_loss=0.01204, audio_tagging_loss=0.008644, over 3046978.92 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:38:43,569 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.323e+01 8.815e+01 9.478e+01 1.004e+02 1.302e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 14:38:46,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3547680.0, ans=0.125 2023-11-28 14:38:58,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3547746.6666666665, ans=0.125 2023-11-28 14:38:58,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3547746.6666666665, ans=0.0 2023-11-28 14:39:04,686 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.58 vs. limit=15.0 2023-11-28 14:39:23,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3547946.6666666665, ans=0.125 2023-11-28 14:39:31,226 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532200 2023-11-28 14:39:35,929 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3150, loss[loss=0.08137, simple_loss=0.1093, pruned_loss=0.01768, audio_tagging_loss=0.009041, over 14537.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.09033, pruned_loss=0.01208, audio_tagging_loss=0.008674, over 3050816.52 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:39:43,325 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.65 vs. limit=22.5 2023-11-28 14:39:55,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3548080.0, ans=0.0 2023-11-28 14:40:38,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3548146.6666666665, ans=0.125 2023-11-28 14:41:11,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3548280.0, ans=0.1 2023-11-28 14:41:16,647 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532250 2023-11-28 14:41:23,872 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3200, loss[loss=0.08573, simple_loss=0.1098, pruned_loss=0.02212, audio_tagging_loss=0.008722, over 13755.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09117, pruned_loss=0.0122, audio_tagging_loss=0.008717, over 3051965.79 frames. ], batch size: 52, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 14:41:33,615 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.062e+01 8.970e+01 9.466e+01 1.009e+02 1.247e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 14:42:41,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3548613.3333333335, ans=0.0 2023-11-28 14:42:43,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3548613.3333333335, ans=0.1 2023-11-28 14:42:53,759 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532300 2023-11-28 14:42:54,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3548613.3333333335, ans=0.125 2023-11-28 14:42:55,668 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:43:00,784 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3250, loss[loss=0.04487, simple_loss=0.05752, pruned_loss=0.00541, audio_tagging_loss=0.0107, over 14402.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09098, pruned_loss=0.01231, audio_tagging_loss=0.008799, over 3051690.16 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 14:43:01,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3548680.0, ans=0.125 2023-11-28 14:43:06,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3548680.0, ans=0.0 2023-11-28 14:43:10,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3548680.0, ans=0.125 2023-11-28 14:43:14,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3548680.0, ans=0.125 2023-11-28 14:43:26,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3548746.6666666665, ans=0.1 2023-11-28 14:43:33,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3548746.6666666665, ans=0.125 2023-11-28 14:43:55,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3548880.0, ans=0.07 2023-11-28 14:44:05,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3548880.0, ans=0.1 2023-11-28 14:44:11,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3548880.0, ans=0.0 2023-11-28 14:44:27,732 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532350 2023-11-28 14:44:35,334 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3300, loss[loss=0.06778, simple_loss=0.1007, pruned_loss=0.011, audio_tagging_loss=0.006436, over 15818.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09043, pruned_loss=0.01228, audio_tagging_loss=0.008932, over 3047219.89 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:44:49,663 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.697e+01 8.914e+01 9.371e+01 1.015e+02 1.378e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-28 14:44:51,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3549013.3333333335, ans=0.125 2023-11-28 14:45:40,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3549213.3333333335, ans=0.125 2023-11-28 14:45:42,750 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2023-11-28 14:45:54,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3549280.0, ans=0.0 2023-11-28 14:45:55,663 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532400 2023-11-28 14:46:02,290 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3350, loss[loss=0.05072, simple_loss=0.07063, pruned_loss=0.006917, audio_tagging_loss=0.008494, over 15386.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09015, pruned_loss=0.01222, audio_tagging_loss=0.008818, over 3044733.71 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:46:24,349 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.88 vs. limit=15.0 2023-11-28 14:47:18,298 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532450 2023-11-28 14:47:25,193 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3400, loss[loss=0.06394, simple_loss=0.09535, pruned_loss=0.01044, audio_tagging_loss=0.005826, over 15014.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09046, pruned_loss=0.01228, audio_tagging_loss=0.008605, over 3042425.53 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:47:27,433 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.93 vs. limit=15.0 2023-11-28 14:47:35,372 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.293e+01 8.841e+01 9.503e+01 1.045e+02 1.895e+02, threshold=1.901e+02, percent-clipped=1.0 2023-11-28 14:47:56,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3549813.3333333335, ans=0.0 2023-11-28 14:47:59,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3549813.3333333335, ans=0.1 2023-11-28 14:48:15,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3549880.0, ans=0.125 2023-11-28 14:48:37,325 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532500 2023-11-28 14:48:42,916 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3450, loss[loss=0.05532, simple_loss=0.07055, pruned_loss=0.008708, audio_tagging_loss=0.01134, over 13624.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.09008, pruned_loss=0.01215, audio_tagging_loss=0.00855, over 3045512.00 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:49:19,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3550146.6666666665, ans=0.125 2023-11-28 14:49:20,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3550146.6666666665, ans=0.0 2023-11-28 14:49:37,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3550213.3333333335, ans=0.125 2023-11-28 14:49:42,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3550213.3333333335, ans=0.1 2023-11-28 14:49:53,871 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532550 2023-11-28 14:49:56,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3550280.0, ans=0.0 2023-11-28 14:49:59,767 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3500, loss[loss=0.06778, simple_loss=0.09352, pruned_loss=0.01122, audio_tagging_loss=0.009803, over 15740.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09028, pruned_loss=0.01235, audio_tagging_loss=0.008646, over 3040872.22 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:50:08,278 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.938e+01 9.500e+01 1.028e+02 1.250e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 14:50:13,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3550413.3333333335, ans=0.0 2023-11-28 14:50:34,710 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2023-11-28 14:50:39,352 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:50:43,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3550546.6666666665, ans=0.125 2023-11-28 14:50:47,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3550546.6666666665, ans=0.95 2023-11-28 14:51:07,252 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532600 2023-11-28 14:51:13,087 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3550, loss[loss=0.06426, simple_loss=0.09002, pruned_loss=0.01093, audio_tagging_loss=0.008318, over 15219.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09028, pruned_loss=0.01243, audio_tagging_loss=0.008566, over 3037231.98 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:51:31,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3550746.6666666665, ans=0.1 2023-11-28 14:51:38,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3550746.6666666665, ans=0.125 2023-11-28 14:52:04,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3550880.0, ans=0.2 2023-11-28 14:52:18,812 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532650 2023-11-28 14:52:24,472 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3600, loss[loss=0.07467, simple_loss=0.1038, pruned_loss=0.01627, audio_tagging_loss=0.006506, over 14923.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08964, pruned_loss=0.01239, audio_tagging_loss=0.008473, over 3041671.22 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 14:52:32,701 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.393e+01 8.883e+01 9.624e+01 1.026e+02 1.265e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 14:52:45,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3551080.0, ans=0.125 2023-11-28 14:52:48,927 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.45 vs. limit=15.0 2023-11-28 14:52:49,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3551080.0, ans=0.05 2023-11-28 14:52:57,475 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.32 vs. limit=15.0 2023-11-28 14:53:02,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3551146.6666666665, ans=0.1 2023-11-28 14:53:15,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3551213.3333333335, ans=0.125 2023-11-28 14:53:29,899 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532700 2023-11-28 14:53:30,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3551280.0, ans=0.0 2023-11-28 14:53:34,935 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3650, loss[loss=0.08296, simple_loss=0.1155, pruned_loss=0.01698, audio_tagging_loss=0.008246, over 15123.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09018, pruned_loss=0.01244, audio_tagging_loss=0.008464, over 3042519.78 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:53:36,723 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.31 vs. limit=12.0 2023-11-28 14:53:46,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3551346.6666666665, ans=0.2 2023-11-28 14:53:59,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3551413.3333333335, ans=0.2 2023-11-28 14:54:03,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3551480.0, ans=0.125 2023-11-28 14:54:04,050 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.76 vs. limit=22.5 2023-11-28 14:54:13,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3551480.0, ans=0.125 2023-11-28 14:54:17,879 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2023-11-28 14:54:32,307 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:54:39,925 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532750 2023-11-28 14:54:41,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3551613.3333333335, ans=0.125 2023-11-28 14:54:43,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3551613.3333333335, ans=0.0 2023-11-28 14:54:45,632 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3700, loss[loss=0.06104, simple_loss=0.08676, pruned_loss=0.009993, audio_tagging_loss=0.007667, over 14239.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.0899, pruned_loss=0.01236, audio_tagging_loss=0.008436, over 3046864.10 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:54:55,107 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.996e+01 9.675e+01 1.033e+02 1.277e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 14:55:10,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3551746.6666666665, ans=10.0 2023-11-28 14:55:16,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3551813.3333333335, ans=0.07 2023-11-28 14:55:47,927 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532800 2023-11-28 14:55:48,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=3551946.6666666665, ans=22.5 2023-11-28 14:55:52,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3552013.3333333335, ans=0.125 2023-11-28 14:55:53,207 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3750, loss[loss=0.07043, simple_loss=0.09645, pruned_loss=0.01198, audio_tagging_loss=0.01022, over 14787.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08956, pruned_loss=0.0123, audio_tagging_loss=0.008526, over 3051051.57 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:55:59,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3552013.3333333335, ans=0.0 2023-11-28 14:56:40,615 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:56:54,267 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532850 2023-11-28 14:56:58,870 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3800, loss[loss=0.07357, simple_loss=0.107, pruned_loss=0.01237, audio_tagging_loss=0.007694, over 15166.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09073, pruned_loss=0.01244, audio_tagging_loss=0.008614, over 3049190.88 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:57:01,548 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:57:06,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3552346.6666666665, ans=0.2 2023-11-28 14:57:07,285 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.454e+01 9.078e+01 9.747e+01 1.050e+02 1.632e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-28 14:57:34,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3552546.6666666665, ans=0.125 2023-11-28 14:57:54,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3552613.3333333335, ans=0.2 2023-11-28 14:57:55,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3552613.3333333335, ans=0.125 2023-11-28 14:57:56,438 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532900 2023-11-28 14:57:58,488 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2023-11-28 14:58:01,915 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3850, loss[loss=0.0718, simple_loss=0.1019, pruned_loss=0.0143, audio_tagging_loss=0.006557, over 16018.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09118, pruned_loss=0.01247, audio_tagging_loss=0.008561, over 3045427.53 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:58:03,459 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:58:05,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3552680.0, ans=10.0 2023-11-28 14:58:19,188 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.92 vs. limit=22.5 2023-11-28 14:58:26,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.56 vs. limit=12.0 2023-11-28 14:58:30,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3552813.3333333335, ans=0.0 2023-11-28 14:58:54,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3552946.6666666665, ans=0.2 2023-11-28 14:58:59,555 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532950 2023-11-28 14:59:04,034 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3900, loss[loss=0.07149, simple_loss=0.1044, pruned_loss=0.013, audio_tagging_loss=0.006299, over 14902.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09046, pruned_loss=0.01228, audio_tagging_loss=0.008647, over 3040662.25 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:59:05,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3553013.3333333335, ans=0.125 2023-11-28 14:59:12,233 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.774e+01 9.422e+01 1.004e+02 1.392e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 14:59:13,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3553013.3333333335, ans=0.125 2023-11-28 14:59:16,300 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.21 vs. limit=15.0 2023-11-28 14:59:21,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3553080.0, ans=10.0 2023-11-28 14:59:32,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3553146.6666666665, ans=0.0 2023-11-28 14:59:39,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.54 vs. limit=15.0 2023-11-28 14:59:44,333 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.23 vs. limit=10.0 2023-11-28 14:59:52,539 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.43 vs. limit=15.0 2023-11-28 14:59:53,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3553280.0, ans=0.125 2023-11-28 14:59:54,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3553280.0, ans=0.125 2023-11-28 14:59:57,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3553280.0, ans=0.125 2023-11-28 14:59:57,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3553280.0, ans=0.2 2023-11-28 14:59:59,782 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533000 2023-11-28 15:00:01,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3553280.0, ans=0.0 2023-11-28 15:00:05,556 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3950, loss[loss=0.07461, simple_loss=0.09745, pruned_loss=0.01709, audio_tagging_loss=0.008798, over 16363.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08994, pruned_loss=0.01221, audio_tagging_loss=0.008608, over 3041545.81 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:00:16,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.17 vs. limit=10.0 2023-11-28 15:00:24,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3553413.3333333335, ans=0.1 2023-11-28 15:00:27,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3553480.0, ans=0.0 2023-11-28 15:00:37,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3553480.0, ans=0.025 2023-11-28 15:00:39,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3553546.6666666665, ans=0.125 2023-11-28 15:01:00,296 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533050 2023-11-28 15:01:04,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3553680.0, ans=0.2 2023-11-28 15:01:04,946 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4000, loss[loss=0.07989, simple_loss=0.1051, pruned_loss=0.01696, audio_tagging_loss=0.0104, over 14305.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09049, pruned_loss=0.01241, audio_tagging_loss=0.008652, over 3046320.76 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:01:11,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3553680.0, ans=0.125 2023-11-28 15:01:13,354 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 9.021e+01 9.610e+01 1.042e+02 1.658e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 15:01:25,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3553746.6666666665, ans=0.2 2023-11-28 15:02:00,814 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533100 2023-11-28 15:02:05,831 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4050, loss[loss=0.06556, simple_loss=0.07639, pruned_loss=0.01643, audio_tagging_loss=0.01093, over 15244.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08979, pruned_loss=0.0123, audio_tagging_loss=0.008816, over 3049529.81 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:02:07,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3554013.3333333335, ans=0.07 2023-11-28 15:02:10,544 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 15:02:20,545 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:02:25,514 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.19 vs. limit=15.0 2023-11-28 15:02:38,428 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:02:53,940 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.63 vs. limit=15.0 2023-11-28 15:03:01,396 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533150 2023-11-28 15:03:05,611 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.03 vs. limit=8.0 2023-11-28 15:03:06,494 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4100, loss[loss=0.06898, simple_loss=0.09599, pruned_loss=0.01208, audio_tagging_loss=0.008907, over 14748.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08893, pruned_loss=0.0121, audio_tagging_loss=0.008894, over 3042071.82 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:03:09,425 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.06 vs. limit=15.0 2023-11-28 15:03:16,445 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 8.818e+01 9.455e+01 1.012e+02 1.204e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-28 15:03:28,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3554413.3333333335, ans=0.125 2023-11-28 15:03:36,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3554480.0, ans=0.125 2023-11-28 15:03:44,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3554546.6666666665, ans=0.125 2023-11-28 15:03:47,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3554546.6666666665, ans=0.125 2023-11-28 15:04:01,652 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533200 2023-11-28 15:04:06,994 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4150, loss[loss=0.05815, simple_loss=0.08287, pruned_loss=0.009626, audio_tagging_loss=0.007088, over 15710.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08908, pruned_loss=0.01205, audio_tagging_loss=0.008733, over 3044152.77 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:04:39,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3554813.3333333335, ans=0.125 2023-11-28 15:04:42,292 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.90 vs. limit=15.0 2023-11-28 15:04:43,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3554880.0, ans=0.1 2023-11-28 15:04:52,377 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 15:04:56,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3554946.6666666665, ans=0.0 2023-11-28 15:04:57,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3554946.6666666665, ans=0.1 2023-11-28 15:05:01,840 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533250 2023-11-28 15:05:04,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3554946.6666666665, ans=0.0 2023-11-28 15:05:06,749 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4200, loss[loss=0.05133, simple_loss=0.06814, pruned_loss=0.006988, audio_tagging_loss=0.01027, over 14612.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08902, pruned_loss=0.01204, audio_tagging_loss=0.008631, over 3033841.57 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:05:08,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3555013.3333333335, ans=0.0 2023-11-28 15:05:14,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3555013.3333333335, ans=0.0 2023-11-28 15:05:15,743 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.572e+01 9.503e+01 1.029e+02 1.274e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 15:05:15,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3555013.3333333335, ans=0.125 2023-11-28 15:05:21,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3555080.0, ans=0.125 2023-11-28 15:05:48,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3555213.3333333335, ans=0.1 2023-11-28 15:06:00,947 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533300 2023-11-28 15:06:02,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3555280.0, ans=0.125 2023-11-28 15:06:05,324 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4250, loss[loss=0.07152, simple_loss=0.09703, pruned_loss=0.01438, audio_tagging_loss=0.008619, over 15449.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.0891, pruned_loss=0.01217, audio_tagging_loss=0.008481, over 3042633.40 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:06:18,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.80 vs. limit=15.0 2023-11-28 15:06:25,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3555413.3333333335, ans=0.125 2023-11-28 15:06:29,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3555480.0, ans=0.05 2023-11-28 15:06:31,236 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:06:47,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3555546.6666666665, ans=0.125 2023-11-28 15:06:50,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3555546.6666666665, ans=0.1 2023-11-28 15:07:00,064 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533350 2023-11-28 15:07:03,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3555680.0, ans=0.0 2023-11-28 15:07:04,453 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4300, loss[loss=0.08248, simple_loss=0.1159, pruned_loss=0.01979, audio_tagging_loss=0.00477, over 15695.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.09042, pruned_loss=0.01217, audio_tagging_loss=0.008436, over 3051019.27 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:07:04,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3555680.0, ans=0.04949747468305833 2023-11-28 15:07:13,806 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 9.132e+01 9.735e+01 1.057e+02 1.337e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-28 15:07:32,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3555813.3333333335, ans=0.125 2023-11-28 15:07:36,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3555813.3333333335, ans=10.0 2023-11-28 15:07:44,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3555880.0, ans=0.125 2023-11-28 15:07:58,896 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533400 2023-11-28 15:08:03,642 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4350, loss[loss=0.08322, simple_loss=0.115, pruned_loss=0.01872, audio_tagging_loss=0.00699, over 15800.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09204, pruned_loss=0.01256, audio_tagging_loss=0.0083, over 3056957.54 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:08:38,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3556213.3333333335, ans=0.125 2023-11-28 15:08:57,888 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533450 2023-11-28 15:09:02,400 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4400, loss[loss=0.06332, simple_loss=0.08781, pruned_loss=0.0112, audio_tagging_loss=0.008217, over 15550.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09098, pruned_loss=0.01263, audio_tagging_loss=0.008388, over 3061895.22 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:09:12,180 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.392e+01 8.943e+01 9.727e+01 1.045e+02 1.586e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-28 15:09:14,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3556413.3333333335, ans=0.125 2023-11-28 15:09:14,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3556413.3333333335, ans=0.2 2023-11-28 15:09:42,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3556546.6666666665, ans=0.125 2023-11-28 15:09:42,971 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-28 15:09:57,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3556613.3333333335, ans=10.0 2023-11-28 15:10:02,204 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533500 2023-11-28 15:10:06,975 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4450, loss[loss=0.07201, simple_loss=0.1015, pruned_loss=0.01501, audio_tagging_loss=0.006248, over 15292.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08955, pruned_loss=0.01231, audio_tagging_loss=0.00852, over 3063566.83 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:10:11,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3556680.0, ans=0.1 2023-11-28 15:10:12,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=3556680.0, ans=0.1 2023-11-28 15:10:32,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3556813.3333333335, ans=0.0 2023-11-28 15:10:37,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3556813.3333333335, ans=0.125 2023-11-28 15:10:43,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3556813.3333333335, ans=0.09899494936611666 2023-11-28 15:11:06,062 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533550 2023-11-28 15:11:10,938 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4500, loss[loss=0.05234, simple_loss=0.07349, pruned_loss=0.007439, audio_tagging_loss=0.008153, over 14464.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.0891, pruned_loss=0.01228, audio_tagging_loss=0.008568, over 3060729.76 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:11:13,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3557013.3333333335, ans=0.125 2023-11-28 15:11:17,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3557013.3333333335, ans=0.2 2023-11-28 15:11:22,374 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.702e+01 8.792e+01 9.317e+01 1.000e+02 1.287e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-28 15:11:26,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3557080.0, ans=0.125 2023-11-28 15:11:28,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3557080.0, ans=0.5 2023-11-28 15:12:00,768 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.44 vs. limit=15.0 2023-11-28 15:12:06,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3557280.0, ans=0.04949747468305833 2023-11-28 15:12:10,884 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533600 2023-11-28 15:12:15,961 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4550, loss[loss=0.07247, simple_loss=0.1009, pruned_loss=0.01314, audio_tagging_loss=0.008877, over 17335.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08972, pruned_loss=0.01227, audio_tagging_loss=0.008554, over 3058294.52 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:12:26,697 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:12:37,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3557413.3333333335, ans=0.5 2023-11-28 15:13:04,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3557546.6666666665, ans=0.1 2023-11-28 15:13:06,979 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 15:13:14,950 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533650 2023-11-28 15:13:19,719 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4600, loss[loss=0.0571, simple_loss=0.07469, pruned_loss=0.009677, audio_tagging_loss=0.01007, over 15711.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08912, pruned_loss=0.01217, audio_tagging_loss=0.008588, over 3047947.54 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:13:29,183 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.928e+01 9.397e+01 1.022e+02 1.415e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-28 15:13:50,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3557813.3333333335, ans=0.125 2023-11-28 15:13:55,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3557813.3333333335, ans=0.125 2023-11-28 15:14:13,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3557946.6666666665, ans=0.2 2023-11-28 15:14:14,682 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.75 vs. limit=15.0 2023-11-28 15:14:16,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3557946.6666666665, ans=0.125 2023-11-28 15:14:17,701 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533700 2023-11-28 15:14:20,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3557946.6666666665, ans=0.125 2023-11-28 15:14:22,803 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4650, loss[loss=0.0597, simple_loss=0.0798, pruned_loss=0.01186, audio_tagging_loss=0.007938, over 14148.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08907, pruned_loss=0.01216, audio_tagging_loss=0.008647, over 3040256.25 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:15:00,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3558213.3333333335, ans=0.0 2023-11-28 15:15:11,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3558213.3333333335, ans=0.125 2023-11-28 15:15:13,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3558280.0, ans=0.0 2023-11-28 15:15:21,878 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533750 2023-11-28 15:15:27,159 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4700, loss[loss=0.06979, simple_loss=0.0945, pruned_loss=0.01401, audio_tagging_loss=0.008533, over 15573.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08961, pruned_loss=0.01231, audio_tagging_loss=0.008699, over 3038972.88 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:15:38,953 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 8.852e+01 9.419e+01 1.023e+02 1.642e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 15:15:49,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3558413.3333333335, ans=0.07 2023-11-28 15:15:55,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3558480.0, ans=0.125 2023-11-28 15:16:15,965 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:16:26,103 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533800 2023-11-28 15:16:31,089 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4750, loss[loss=0.06346, simple_loss=0.08926, pruned_loss=0.01079, audio_tagging_loss=0.008043, over 16497.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08916, pruned_loss=0.01231, audio_tagging_loss=0.00875, over 3037712.89 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:16:35,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3558680.0, ans=0.125 2023-11-28 15:16:55,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3558813.3333333335, ans=0.0 2023-11-28 15:17:06,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3558813.3333333335, ans=0.125 2023-11-28 15:17:21,270 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.79 vs. limit=15.0 2023-11-28 15:17:29,159 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533850 2023-11-28 15:17:34,702 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4800, loss[loss=0.06414, simple_loss=0.0926, pruned_loss=0.00978, audio_tagging_loss=0.008063, over 14230.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08994, pruned_loss=0.01244, audio_tagging_loss=0.008844, over 3043896.41 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:17:45,884 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 9.221e+01 9.663e+01 1.040e+02 1.234e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-28 15:17:47,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.19 vs. limit=10.0 2023-11-28 15:17:50,065 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-11-28 15:17:58,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3559146.6666666665, ans=0.0 2023-11-28 15:18:18,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3559213.3333333335, ans=0.0 2023-11-28 15:18:31,404 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533900 2023-11-28 15:18:36,131 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4850, loss[loss=0.07139, simple_loss=0.09753, pruned_loss=0.01514, audio_tagging_loss=0.007489, over 15791.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.08975, pruned_loss=0.01258, audio_tagging_loss=0.008883, over 3042379.49 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:18:38,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3559346.6666666665, ans=0.2 2023-11-28 15:18:57,948 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.01 vs. limit=12.0 2023-11-28 15:19:12,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3559546.6666666665, ans=0.09899494936611666 2023-11-28 15:19:33,314 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533950 2023-11-28 15:19:38,613 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4900, loss[loss=0.06258, simple_loss=0.0779, pruned_loss=0.01126, audio_tagging_loss=0.01237, over 14277.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09064, pruned_loss=0.01262, audio_tagging_loss=0.00882, over 3040005.54 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:19:49,248 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 9.026e+01 9.693e+01 1.038e+02 1.259e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 15:19:54,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3559746.6666666665, ans=0.0 2023-11-28 15:20:13,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3559813.3333333335, ans=0.0 2023-11-28 15:20:29,154 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.83 vs. limit=10.0 2023-11-28 15:20:35,436 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534000 2023-11-28 15:20:40,396 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4950, loss[loss=0.05323, simple_loss=0.06821, pruned_loss=0.007872, audio_tagging_loss=0.01126, over 13977.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08955, pruned_loss=0.01231, audio_tagging_loss=0.008678, over 3041445.30 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:20:53,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3560080.0, ans=0.2 2023-11-28 15:20:56,387 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2023-11-28 15:21:18,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3560213.3333333335, ans=0.0 2023-11-28 15:21:35,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3560280.0, ans=0.95 2023-11-28 15:21:37,530 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534050 2023-11-28 15:21:38,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3560280.0, ans=0.125 2023-11-28 15:21:42,808 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5000, loss[loss=0.06439, simple_loss=0.0803, pruned_loss=0.01216, audio_tagging_loss=0.01208, over 15147.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08885, pruned_loss=0.0122, audio_tagging_loss=0.008649, over 3042887.66 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:21:43,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3560346.6666666665, ans=0.1 2023-11-28 15:21:47,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3560346.6666666665, ans=0.09899494936611666 2023-11-28 15:21:52,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3560346.6666666665, ans=0.5 2023-11-28 15:21:53,309 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.494e+01 8.823e+01 9.586e+01 1.031e+02 1.320e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 15:22:13,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3560480.0, ans=0.07 2023-11-28 15:22:21,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3560546.6666666665, ans=0.125 2023-11-28 15:22:25,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3560546.6666666665, ans=0.0 2023-11-28 15:22:30,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3560546.6666666665, ans=0.0 2023-11-28 15:22:39,794 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534100 2023-11-28 15:22:41,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3560613.3333333335, ans=0.125 2023-11-28 15:22:45,191 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5050, loss[loss=0.05417, simple_loss=0.06768, pruned_loss=0.01199, audio_tagging_loss=0.00833, over 15000.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08806, pruned_loss=0.01204, audio_tagging_loss=0.008649, over 3039756.76 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:23:03,248 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2023-11-28 15:23:08,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3560813.3333333335, ans=0.125 2023-11-28 15:23:17,421 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.25 vs. limit=15.0 2023-11-28 15:23:31,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3560880.0, ans=0.125 2023-11-28 15:23:32,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.17 vs. limit=12.0 2023-11-28 15:23:41,446 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534150 2023-11-28 15:23:41,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3560946.6666666665, ans=0.125 2023-11-28 15:23:46,151 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5100, loss[loss=0.04297, simple_loss=0.04266, pruned_loss=0.006818, audio_tagging_loss=0.01482, over 14091.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08777, pruned_loss=0.01195, audio_tagging_loss=0.008627, over 3040673.92 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:23:58,684 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 9.022e+01 9.569e+01 1.030e+02 1.259e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 15:24:13,044 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.86 vs. limit=15.0 2023-11-28 15:24:14,431 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=22.5 2023-11-28 15:24:16,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.76 vs. limit=15.0 2023-11-28 15:24:23,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3561213.3333333335, ans=0.125 2023-11-28 15:24:28,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3561213.3333333335, ans=0.05 2023-11-28 15:24:30,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3561213.3333333335, ans=0.0 2023-11-28 15:24:32,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.93 vs. limit=15.0 2023-11-28 15:24:34,678 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.12 vs. limit=15.0 2023-11-28 15:24:43,926 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534200 2023-11-28 15:24:48,758 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5150, loss[loss=0.05931, simple_loss=0.08638, pruned_loss=0.008743, audio_tagging_loss=0.00738, over 14956.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08816, pruned_loss=0.01201, audio_tagging_loss=0.008623, over 3038826.15 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:24:53,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3561346.6666666665, ans=0.125 2023-11-28 15:24:57,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3561346.6666666665, ans=0.0 2023-11-28 15:25:08,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3561413.3333333335, ans=0.125 2023-11-28 15:25:11,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3561413.3333333335, ans=0.125 2023-11-28 15:25:14,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3561480.0, ans=0.09899494936611666 2023-11-28 15:25:32,087 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.35 vs. limit=22.5 2023-11-28 15:25:46,254 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534250 2023-11-28 15:25:50,877 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5200, loss[loss=0.08288, simple_loss=0.121, pruned_loss=0.01779, audio_tagging_loss=0.004574, over 15556.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08871, pruned_loss=0.01208, audio_tagging_loss=0.008621, over 3034850.90 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:26:03,809 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 8.561e+01 9.249e+01 1.010e+02 1.176e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-28 15:26:04,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3561746.6666666665, ans=0.125 2023-11-28 15:26:13,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3561746.6666666665, ans=0.0 2023-11-28 15:26:23,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3561813.3333333335, ans=0.125 2023-11-28 15:26:32,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3561880.0, ans=0.05 2023-11-28 15:26:48,812 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534300 2023-11-28 15:26:51,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3561946.6666666665, ans=0.125 2023-11-28 15:26:53,359 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5250, loss[loss=0.0651, simple_loss=0.09062, pruned_loss=0.01187, audio_tagging_loss=0.007921, over 14758.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08885, pruned_loss=0.01211, audio_tagging_loss=0.008502, over 3042102.42 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:27:09,781 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=22.5 2023-11-28 15:27:16,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3562080.0, ans=0.5 2023-11-28 15:27:30,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3562213.3333333335, ans=0.125 2023-11-28 15:27:44,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3562280.0, ans=0.0 2023-11-28 15:27:50,532 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534350 2023-11-28 15:27:55,105 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5300, loss[loss=0.09519, simple_loss=0.1375, pruned_loss=0.01974, audio_tagging_loss=0.006685, over 15330.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09005, pruned_loss=0.01237, audio_tagging_loss=0.008489, over 3048788.98 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:27:56,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3562346.6666666665, ans=0.125 2023-11-28 15:28:06,928 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2023-11-28 15:28:07,530 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.489e+01 8.987e+01 9.599e+01 1.032e+02 1.281e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 15:28:11,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3562413.3333333335, ans=0.125 2023-11-28 15:28:39,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3562546.6666666665, ans=0.125 2023-11-28 15:28:52,896 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534400 2023-11-28 15:28:55,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3562613.3333333335, ans=0.125 2023-11-28 15:28:57,894 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5350, loss[loss=0.07991, simple_loss=0.1085, pruned_loss=0.01743, audio_tagging_loss=0.008255, over 15846.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09086, pruned_loss=0.01265, audio_tagging_loss=0.008558, over 3041102.38 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:29:01,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3562680.0, ans=0.2 2023-11-28 15:29:07,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.08 vs. limit=10.0 2023-11-28 15:29:47,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3562946.6666666665, ans=0.0 2023-11-28 15:29:55,303 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534450 2023-11-28 15:29:59,985 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5400, loss[loss=0.06136, simple_loss=0.09146, pruned_loss=0.008609, audio_tagging_loss=0.007027, over 15585.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09058, pruned_loss=0.01256, audio_tagging_loss=0.0086, over 3036903.61 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:30:04,334 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.02 vs. limit=15.0 2023-11-28 15:30:14,154 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.799e+01 8.988e+01 9.532e+01 1.029e+02 1.170e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 15:30:15,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3563080.0, ans=0.04949747468305833 2023-11-28 15:30:24,785 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.87 vs. limit=15.0 2023-11-28 15:30:49,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3563280.0, ans=0.1 2023-11-28 15:30:53,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3563280.0, ans=0.125 2023-11-28 15:30:57,471 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534500 2023-11-28 15:30:57,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3563280.0, ans=0.0 2023-11-28 15:31:02,834 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5450, loss[loss=0.07237, simple_loss=0.105, pruned_loss=0.01379, audio_tagging_loss=0.006051, over 16155.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09141, pruned_loss=0.01259, audio_tagging_loss=0.008526, over 3036569.59 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:31:20,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3563413.3333333335, ans=0.1 2023-11-28 15:31:37,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3563480.0, ans=0.1 2023-11-28 15:31:40,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3563546.6666666665, ans=0.04949747468305833 2023-11-28 15:31:42,913 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.12 vs. limit=15.0 2023-11-28 15:32:00,250 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534550 2023-11-28 15:32:04,878 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5500, loss[loss=0.06108, simple_loss=0.08286, pruned_loss=0.01056, audio_tagging_loss=0.009091, over 15811.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09095, pruned_loss=0.01262, audio_tagging_loss=0.00859, over 3035830.09 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:32:18,322 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.730e+01 8.919e+01 9.709e+01 1.036e+02 2.693e+02, threshold=1.942e+02, percent-clipped=1.0 2023-11-28 15:32:26,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3563746.6666666665, ans=0.125 2023-11-28 15:32:30,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3563813.3333333335, ans=0.0 2023-11-28 15:32:33,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3563813.3333333335, ans=0.0 2023-11-28 15:32:54,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3563946.6666666665, ans=0.125 2023-11-28 15:33:01,951 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534600 2023-11-28 15:33:06,944 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5550, loss[loss=0.06588, simple_loss=0.08428, pruned_loss=0.01166, audio_tagging_loss=0.01208, over 15047.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09043, pruned_loss=0.01237, audio_tagging_loss=0.008656, over 3038413.59 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:33:35,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3564146.6666666665, ans=15.0 2023-11-28 15:33:52,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3564213.3333333335, ans=0.0 2023-11-28 15:33:55,688 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:33:59,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3564280.0, ans=10.0 2023-11-28 15:34:03,826 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534650 2023-11-28 15:34:09,198 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5600, loss[loss=0.07511, simple_loss=0.1066, pruned_loss=0.01516, audio_tagging_loss=0.006636, over 16666.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09024, pruned_loss=0.01234, audio_tagging_loss=0.008744, over 3038799.81 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:34:14,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3564346.6666666665, ans=0.0 2023-11-28 15:34:20,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3564346.6666666665, ans=0.125 2023-11-28 15:34:23,317 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.833e+01 8.970e+01 9.641e+01 1.032e+02 1.547e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-28 15:34:23,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3564413.3333333335, ans=0.125 2023-11-28 15:34:29,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3564413.3333333335, ans=0.2 2023-11-28 15:34:29,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3564413.3333333335, ans=0.125 2023-11-28 15:34:56,214 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 15:34:56,728 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2023-11-28 15:34:57,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3564613.3333333335, ans=0.125 2023-11-28 15:34:58,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3564613.3333333335, ans=0.125 2023-11-28 15:35:01,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3564613.3333333335, ans=0.1 2023-11-28 15:35:06,990 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534700 2023-11-28 15:35:11,622 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5650, loss[loss=0.06139, simple_loss=0.08611, pruned_loss=0.01002, audio_tagging_loss=0.00831, over 15129.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08969, pruned_loss=0.01228, audio_tagging_loss=0.00889, over 3046207.36 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:35:37,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3564813.3333333335, ans=0.125 2023-11-28 15:35:41,894 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:35:50,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-28 15:35:58,119 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2023-11-28 15:36:09,212 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534750 2023-11-28 15:36:13,952 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5700, loss[loss=0.04922, simple_loss=0.07336, pruned_loss=0.005771, audio_tagging_loss=0.006766, over 15549.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08891, pruned_loss=0.01213, audio_tagging_loss=0.008913, over 3038988.68 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:36:14,542 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.63 vs. limit=22.5 2023-11-28 15:36:27,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3565080.0, ans=0.125 2023-11-28 15:36:27,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3565080.0, ans=0.1 2023-11-28 15:36:28,546 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.604e+01 8.736e+01 9.435e+01 1.001e+02 1.261e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 15:36:35,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3565080.0, ans=0.125 2023-11-28 15:36:41,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3565146.6666666665, ans=0.0 2023-11-28 15:36:42,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3565146.6666666665, ans=0.2 2023-11-28 15:36:44,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3565146.6666666665, ans=0.125 2023-11-28 15:36:47,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3565146.6666666665, ans=0.125 2023-11-28 15:36:52,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3565213.3333333335, ans=0.125 2023-11-28 15:37:11,493 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534800 2023-11-28 15:37:16,415 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5750, loss[loss=0.07191, simple_loss=0.09863, pruned_loss=0.01426, audio_tagging_loss=0.008331, over 15581.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08917, pruned_loss=0.01217, audio_tagging_loss=0.008793, over 3044920.67 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:37:33,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3565413.3333333335, ans=0.035 2023-11-28 15:37:48,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3565480.0, ans=0.0 2023-11-28 15:38:14,275 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534850 2023-11-28 15:38:14,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3565613.3333333335, ans=0.1 2023-11-28 15:38:17,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3565613.3333333335, ans=0.0 2023-11-28 15:38:20,329 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5800, loss[loss=0.06713, simple_loss=0.08771, pruned_loss=0.01411, audio_tagging_loss=0.009174, over 14598.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08969, pruned_loss=0.01227, audio_tagging_loss=0.008636, over 3042087.65 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:38:20,936 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.26 vs. limit=10.0 2023-11-28 15:38:24,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3565680.0, ans=0.125 2023-11-28 15:38:30,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3565680.0, ans=0.2 2023-11-28 15:38:34,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3565746.6666666665, ans=0.0 2023-11-28 15:38:35,160 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.604e+01 9.339e+01 1.000e+02 1.373e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 15:38:37,096 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2023-11-28 15:39:00,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3565880.0, ans=0.0 2023-11-28 15:39:07,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3565880.0, ans=0.125 2023-11-28 15:39:17,070 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534900 2023-11-28 15:39:18,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3565946.6666666665, ans=0.125 2023-11-28 15:39:22,241 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5850, loss[loss=0.06521, simple_loss=0.09441, pruned_loss=0.0103, audio_tagging_loss=0.007699, over 15691.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09078, pruned_loss=0.01246, audio_tagging_loss=0.008585, over 3045421.48 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:39:25,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3566013.3333333335, ans=0.125 2023-11-28 15:39:27,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3566013.3333333335, ans=0.0 2023-11-28 15:39:52,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3566146.6666666665, ans=0.125 2023-11-28 15:39:57,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3566146.6666666665, ans=0.0 2023-11-28 15:40:04,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3566213.3333333335, ans=0.0 2023-11-28 15:40:10,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3566213.3333333335, ans=0.125 2023-11-28 15:40:11,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3566280.0, ans=0.0 2023-11-28 15:40:15,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3566280.0, ans=0.04949747468305833 2023-11-28 15:40:19,320 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534950 2023-11-28 15:40:24,068 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5900, loss[loss=0.05742, simple_loss=0.07294, pruned_loss=0.01263, audio_tagging_loss=0.008319, over 12669.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09105, pruned_loss=0.0126, audio_tagging_loss=0.008497, over 3045809.85 frames. ], batch size: 50, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:40:31,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3566346.6666666665, ans=0.1 2023-11-28 15:40:39,391 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.038e+01 9.170e+01 9.658e+01 1.028e+02 1.325e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 15:40:42,475 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.51 vs. limit=22.5 2023-11-28 15:40:43,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3566413.3333333335, ans=0.0 2023-11-28 15:40:58,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3566480.0, ans=0.09899494936611666 2023-11-28 15:41:21,377 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535000 2023-11-28 15:41:26,963 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5950, loss[loss=0.06801, simple_loss=0.09037, pruned_loss=0.01284, audio_tagging_loss=0.009985, over 15611.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09154, pruned_loss=0.01267, audio_tagging_loss=0.008487, over 3048454.76 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:41:28,888 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.53 vs. limit=15.0 2023-11-28 15:41:39,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3566746.6666666665, ans=0.125 2023-11-28 15:42:11,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3566880.0, ans=0.125 2023-11-28 15:42:18,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3566946.6666666665, ans=0.2 2023-11-28 15:42:19,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3566946.6666666665, ans=0.125 2023-11-28 15:42:24,419 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535050 2023-11-28 15:42:29,567 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6000, loss[loss=0.07635, simple_loss=0.1041, pruned_loss=0.01655, audio_tagging_loss=0.007763, over 14450.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.0909, pruned_loss=0.01244, audio_tagging_loss=0.008447, over 3049388.39 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:42:29,570 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 15:42:49,773 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0878, 4.5890, 5.1865, 4.7961], device='cuda:0') 2023-11-28 15:43:02,710 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6107, 3.6757, 4.0034, 3.4374], device='cuda:0') 2023-11-28 15:43:06,150 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.4468, 5.3811, 5.2969, 5.1007], device='cuda:0') 2023-11-28 15:43:07,314 INFO [train_asr.py:1267] (0/4) Epoch 45, validation: loss=0.05761, simple_loss=0.05049, pruned_loss=0.005188, audio_tagging_loss=0.02718, over 4681554.00 frames. 2023-11-28 15:43:07,315 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 15:43:07,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3567013.3333333335, ans=0.125 2023-11-28 15:43:19,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3567080.0, ans=10.0 2023-11-28 15:43:22,487 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 8.761e+01 9.402e+01 1.021e+02 1.330e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-28 15:43:36,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3567146.6666666665, ans=0.0 2023-11-28 15:43:47,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3567213.3333333335, ans=0.125 2023-11-28 15:43:54,063 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 15:44:04,975 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535100 2023-11-28 15:44:09,598 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6050, loss[loss=0.0676, simple_loss=0.08935, pruned_loss=0.01218, audio_tagging_loss=0.01074, over 15775.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09049, pruned_loss=0.01243, audio_tagging_loss=0.008532, over 3045490.70 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:44:37,557 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.40 vs. limit=6.0 2023-11-28 15:44:52,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3567546.6666666665, ans=0.0 2023-11-28 15:44:52,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3567546.6666666665, ans=0.125 2023-11-28 15:45:07,131 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535150 2023-11-28 15:45:12,329 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6100, loss[loss=0.04364, simple_loss=0.05463, pruned_loss=0.00383, audio_tagging_loss=0.01249, over 15992.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09057, pruned_loss=0.01236, audio_tagging_loss=0.00846, over 3048953.27 frames. ], batch size: 63, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:45:27,622 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.258e+01 8.963e+01 9.572e+01 1.025e+02 1.368e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 15:45:57,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3567880.0, ans=0.125 2023-11-28 15:46:01,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3567946.6666666665, ans=0.125 2023-11-28 15:46:09,261 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535200 2023-11-28 15:46:14,259 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6150, loss[loss=0.08164, simple_loss=0.1186, pruned_loss=0.0137, audio_tagging_loss=0.008643, over 15283.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09075, pruned_loss=0.01235, audio_tagging_loss=0.008563, over 3047910.90 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:46:21,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3568013.3333333335, ans=0.125 2023-11-28 15:46:48,031 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=15.0 2023-11-28 15:47:01,199 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.05 vs. limit=15.0 2023-11-28 15:47:11,704 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535250 2023-11-28 15:47:17,061 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6200, loss[loss=0.06988, simple_loss=0.09161, pruned_loss=0.01482, audio_tagging_loss=0.009261, over 15303.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08972, pruned_loss=0.0123, audio_tagging_loss=0.00866, over 3052773.07 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:47:23,404 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.32 vs. limit=15.0 2023-11-28 15:47:25,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3568346.6666666665, ans=0.025 2023-11-28 15:47:32,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3568413.3333333335, ans=0.125 2023-11-28 15:47:33,549 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.897e+01 8.984e+01 9.633e+01 1.042e+02 1.273e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 15:47:40,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3568480.0, ans=0.0 2023-11-28 15:47:51,054 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.33 vs. limit=10.0 2023-11-28 15:47:53,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3568546.6666666665, ans=0.125 2023-11-28 15:48:08,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3568613.3333333335, ans=0.125 2023-11-28 15:48:14,178 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535300 2023-11-28 15:48:19,445 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6250, loss[loss=0.06762, simple_loss=0.09753, pruned_loss=0.01082, audio_tagging_loss=0.008032, over 15629.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08962, pruned_loss=0.0123, audio_tagging_loss=0.008655, over 3050992.18 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:48:27,839 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.14 vs. limit=15.0 2023-11-28 15:48:28,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3568680.0, ans=0.125 2023-11-28 15:48:45,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3568813.3333333335, ans=0.1 2023-11-28 15:48:50,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3568813.3333333335, ans=0.1 2023-11-28 15:49:05,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3568880.0, ans=0.5 2023-11-28 15:49:07,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3568880.0, ans=0.125 2023-11-28 15:49:16,853 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535350 2023-11-28 15:49:21,434 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6300, loss[loss=0.08375, simple_loss=0.1184, pruned_loss=0.01951, audio_tagging_loss=0.00506, over 16848.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09007, pruned_loss=0.01227, audio_tagging_loss=0.008746, over 3059280.04 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:49:34,608 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.24 vs. limit=22.5 2023-11-28 15:49:38,101 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 8.816e+01 9.438e+01 1.010e+02 1.307e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 15:49:54,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3569146.6666666665, ans=0.125 2023-11-28 15:49:54,948 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2023-11-28 15:49:56,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3569146.6666666665, ans=0.125 2023-11-28 15:50:19,278 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535400 2023-11-28 15:50:24,774 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6350, loss[loss=0.07944, simple_loss=0.12, pruned_loss=0.01271, audio_tagging_loss=0.006715, over 15341.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08993, pruned_loss=0.01215, audio_tagging_loss=0.008828, over 3055451.28 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:50:36,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3569413.3333333335, ans=0.125 2023-11-28 15:50:56,192 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=15.0 2023-11-28 15:50:57,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3569480.0, ans=0.07 2023-11-28 15:51:14,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3569613.3333333335, ans=0.125 2023-11-28 15:51:21,866 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535450 2023-11-28 15:51:26,584 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6400, loss[loss=0.05598, simple_loss=0.07655, pruned_loss=0.008441, audio_tagging_loss=0.009266, over 15110.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08957, pruned_loss=0.01218, audio_tagging_loss=0.008934, over 3051392.09 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:51:43,575 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.811e+01 9.428e+01 1.008e+02 1.163e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 15:51:47,888 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.68 vs. limit=6.0 2023-11-28 15:52:17,515 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.58 vs. limit=10.0 2023-11-28 15:52:24,873 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535500 2023-11-28 15:52:25,328 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.95 vs. limit=22.5 2023-11-28 15:52:30,156 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6450, loss[loss=0.05258, simple_loss=0.05931, pruned_loss=0.008987, audio_tagging_loss=0.01394, over 14258.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08966, pruned_loss=0.01204, audio_tagging_loss=0.009021, over 3044432.67 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:52:32,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3570013.3333333335, ans=0.125 2023-11-28 15:52:35,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3570013.3333333335, ans=0.0 2023-11-28 15:52:48,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3570080.0, ans=0.1 2023-11-28 15:52:56,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3570146.6666666665, ans=0.04949747468305833 2023-11-28 15:53:00,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3570146.6666666665, ans=0.1 2023-11-28 15:53:02,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3570146.6666666665, ans=0.125 2023-11-28 15:53:23,576 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.87 vs. limit=15.0 2023-11-28 15:53:28,184 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535550 2023-11-28 15:53:32,795 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6500, loss[loss=0.06806, simple_loss=0.08679, pruned_loss=0.01253, audio_tagging_loss=0.01214, over 15083.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08961, pruned_loss=0.01192, audio_tagging_loss=0.008909, over 3045680.75 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:53:37,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3570346.6666666665, ans=0.0 2023-11-28 15:53:48,893 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.726e+01 8.856e+01 9.321e+01 9.995e+01 1.237e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-28 15:53:50,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3570413.3333333335, ans=0.125 2023-11-28 15:53:53,223 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=22.5 2023-11-28 15:54:11,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3570546.6666666665, ans=0.125 2023-11-28 15:54:16,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3570546.6666666665, ans=0.0 2023-11-28 15:54:30,575 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535600 2023-11-28 15:54:32,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3570613.3333333335, ans=0.2 2023-11-28 15:54:35,597 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6550, loss[loss=0.05606, simple_loss=0.07541, pruned_loss=0.009347, audio_tagging_loss=0.009012, over 15249.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08977, pruned_loss=0.01205, audio_tagging_loss=0.008692, over 3037771.13 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:54:39,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3570680.0, ans=0.0 2023-11-28 15:54:42,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3570680.0, ans=0.125 2023-11-28 15:54:43,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3570680.0, ans=0.0 2023-11-28 15:54:47,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3570746.6666666665, ans=0.0 2023-11-28 15:54:49,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3570746.6666666665, ans=0.125 2023-11-28 15:55:05,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3570813.3333333335, ans=0.1 2023-11-28 15:55:33,438 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535650 2023-11-28 15:55:38,041 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6600, loss[loss=0.06714, simple_loss=0.09004, pruned_loss=0.01327, audio_tagging_loss=0.008851, over 13499.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08876, pruned_loss=0.01203, audio_tagging_loss=0.008612, over 3037910.45 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:55:39,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3571013.3333333335, ans=0.125 2023-11-28 15:55:48,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3571013.3333333335, ans=0.125 2023-11-28 15:55:55,576 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 8.913e+01 9.644e+01 1.048e+02 1.369e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-28 15:56:01,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3571080.0, ans=0.0 2023-11-28 15:56:04,842 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2023-11-28 15:56:20,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3571213.3333333335, ans=0.0 2023-11-28 15:56:32,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3571280.0, ans=0.125 2023-11-28 15:56:34,914 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535700 2023-11-28 15:56:40,932 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6650, loss[loss=0.05142, simple_loss=0.06423, pruned_loss=0.007451, audio_tagging_loss=0.01185, over 14942.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08901, pruned_loss=0.01193, audio_tagging_loss=0.008522, over 3038413.31 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:56:45,780 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.43 vs. limit=15.0 2023-11-28 15:56:48,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3571346.6666666665, ans=0.125 2023-11-28 15:56:56,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3571413.3333333335, ans=0.0 2023-11-28 15:57:09,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3571480.0, ans=0.0 2023-11-28 15:57:38,131 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535750 2023-11-28 15:57:42,800 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6700, loss[loss=0.08127, simple_loss=0.1117, pruned_loss=0.01651, audio_tagging_loss=0.008916, over 15327.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08816, pruned_loss=0.01182, audio_tagging_loss=0.00856, over 3034871.61 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:57:46,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3571680.0, ans=0.125 2023-11-28 15:57:59,993 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.409e+01 8.880e+01 9.649e+01 1.029e+02 1.368e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 15:58:01,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3571746.6666666665, ans=0.125 2023-11-28 15:58:37,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3571946.6666666665, ans=0.07 2023-11-28 15:58:39,995 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535800 2023-11-28 15:58:45,039 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6750, loss[loss=0.07713, simple_loss=0.1052, pruned_loss=0.01758, audio_tagging_loss=0.006936, over 15834.00 frames. ], tot_loss[loss=0.06418, simple_loss=0.08769, pruned_loss=0.01177, audio_tagging_loss=0.008562, over 3035082.16 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:58:59,654 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=22.5 2023-11-28 15:59:26,808 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2023-11-28 15:59:36,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3572280.0, ans=0.0 2023-11-28 15:59:36,347 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:59:42,173 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535850 2023-11-28 15:59:47,499 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6800, loss[loss=0.07214, simple_loss=0.1048, pruned_loss=0.01187, audio_tagging_loss=0.007884, over 15058.00 frames. ], tot_loss[loss=0.06432, simple_loss=0.08771, pruned_loss=0.01185, audio_tagging_loss=0.008615, over 3036038.96 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:59:54,919 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2023-11-28 15:59:56,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3572346.6666666665, ans=0.125 2023-11-28 16:00:04,702 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.844e+01 9.079e+01 9.608e+01 1.020e+02 1.284e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 16:00:08,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3572413.3333333335, ans=10.0 2023-11-28 16:00:15,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3572480.0, ans=0.2 2023-11-28 16:00:24,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3572546.6666666665, ans=0.125 2023-11-28 16:00:39,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3572613.3333333335, ans=10.0 2023-11-28 16:00:45,837 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535900 2023-11-28 16:00:50,601 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6850, loss[loss=0.07928, simple_loss=0.1052, pruned_loss=0.01946, audio_tagging_loss=0.007197, over 14932.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08763, pruned_loss=0.01198, audio_tagging_loss=0.008656, over 3034552.41 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:00:59,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3572680.0, ans=0.2 2023-11-28 16:01:14,437 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-28 16:01:31,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3572880.0, ans=0.07 2023-11-28 16:01:46,886 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535950 2023-11-28 16:01:47,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3572946.6666666665, ans=0.0 2023-11-28 16:01:47,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3572946.6666666665, ans=0.125 2023-11-28 16:01:50,694 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.30 vs. limit=22.5 2023-11-28 16:01:52,236 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6900, loss[loss=0.07606, simple_loss=0.1056, pruned_loss=0.01363, audio_tagging_loss=0.009611, over 14720.00 frames. ], tot_loss[loss=0.06424, simple_loss=0.08779, pruned_loss=0.0118, audio_tagging_loss=0.008554, over 3038585.98 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:01:56,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3573013.3333333335, ans=0.1 2023-11-28 16:02:10,314 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 8.778e+01 9.481e+01 1.016e+02 1.477e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 16:02:12,331 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.03 vs. limit=22.5 2023-11-28 16:02:12,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.61 vs. limit=15.0 2023-11-28 16:02:26,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2023-11-28 16:02:36,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3573213.3333333335, ans=0.0 2023-11-28 16:02:36,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3573213.3333333335, ans=0.125 2023-11-28 16:02:44,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3573280.0, ans=0.125 2023-11-28 16:02:45,235 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 16:02:51,115 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536000 2023-11-28 16:02:52,561 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-536000.pt 2023-11-28 16:02:58,690 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6950, loss[loss=0.06691, simple_loss=0.09554, pruned_loss=0.01079, audio_tagging_loss=0.008355, over 16517.00 frames. ], tot_loss[loss=0.0639, simple_loss=0.0875, pruned_loss=0.01165, audio_tagging_loss=0.0085, over 3044067.50 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:03:14,266 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2023-11-28 16:03:33,138 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:03:36,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3573546.6666666665, ans=0.125 2023-11-28 16:03:41,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3573546.6666666665, ans=0.2 2023-11-28 16:03:44,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3573546.6666666665, ans=0.125 2023-11-28 16:03:51,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3573613.3333333335, ans=0.2 2023-11-28 16:03:52,020 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=12.0 2023-11-28 16:03:56,628 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536050 2023-11-28 16:04:01,260 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7000, loss[loss=0.07867, simple_loss=0.1146, pruned_loss=0.01529, audio_tagging_loss=0.006073, over 14770.00 frames. ], tot_loss[loss=0.06364, simple_loss=0.08686, pruned_loss=0.01158, audio_tagging_loss=0.008623, over 3037567.98 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:04:03,843 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2023-11-28 16:04:05,849 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:04:18,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3573746.6666666665, ans=0.0 2023-11-28 16:04:18,999 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.731e+01 9.457e+01 1.025e+02 1.203e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-28 16:04:31,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3573813.3333333335, ans=0.125 2023-11-28 16:04:38,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3573880.0, ans=0.125 2023-11-28 16:04:49,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3573880.0, ans=0.04949747468305833 2023-11-28 16:04:55,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3573946.6666666665, ans=0.0 2023-11-28 16:04:58,810 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536100 2023-11-28 16:05:03,924 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7050, loss[loss=0.06315, simple_loss=0.08855, pruned_loss=0.008727, audio_tagging_loss=0.01014, over 16027.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08862, pruned_loss=0.01185, audio_tagging_loss=0.008621, over 3038373.65 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:05:11,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.33 vs. limit=22.5 2023-11-28 16:05:16,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3574080.0, ans=0.0 2023-11-28 16:05:36,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3574146.6666666665, ans=0.04949747468305833 2023-11-28 16:05:44,464 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2023-11-28 16:05:59,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3574280.0, ans=0.125 2023-11-28 16:06:01,081 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536150 2023-11-28 16:06:05,782 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7100, loss[loss=0.06713, simple_loss=0.09361, pruned_loss=0.01082, audio_tagging_loss=0.009511, over 16759.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08847, pruned_loss=0.01184, audio_tagging_loss=0.008666, over 3040502.31 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:06:17,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3574413.3333333335, ans=0.2 2023-11-28 16:06:21,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3574413.3333333335, ans=0.125 2023-11-28 16:06:23,800 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.873e+01 8.973e+01 9.710e+01 1.043e+02 1.342e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 16:06:33,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3574480.0, ans=0.125 2023-11-28 16:06:41,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3574480.0, ans=10.0 2023-11-28 16:06:58,892 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.96 vs. limit=15.0 2023-11-28 16:07:04,288 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536200 2023-11-28 16:07:06,439 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2023-11-28 16:07:09,609 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7150, loss[loss=0.07522, simple_loss=0.1028, pruned_loss=0.01674, audio_tagging_loss=0.007049, over 16031.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08879, pruned_loss=0.01185, audio_tagging_loss=0.0087, over 3039336.29 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:07:15,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3574680.0, ans=0.0 2023-11-28 16:07:30,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3574746.6666666665, ans=0.0 2023-11-28 16:07:34,187 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.91 vs. limit=15.0 2023-11-28 16:07:56,657 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.26 vs. limit=15.0 2023-11-28 16:07:56,679 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.81 vs. limit=15.0 2023-11-28 16:08:07,105 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536250 2023-11-28 16:08:12,291 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7200, loss[loss=0.08269, simple_loss=0.1088, pruned_loss=0.01712, audio_tagging_loss=0.01115, over 14245.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08875, pruned_loss=0.01195, audio_tagging_loss=0.008801, over 3043513.43 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:08:29,535 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 8.876e+01 9.486e+01 1.031e+02 1.370e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 16:08:38,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3575146.6666666665, ans=0.2 2023-11-28 16:08:44,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3575146.6666666665, ans=0.125 2023-11-28 16:08:44,333 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=15.0 2023-11-28 16:08:51,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3575213.3333333335, ans=0.125 2023-11-28 16:08:54,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3575213.3333333335, ans=0.1 2023-11-28 16:08:55,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3575213.3333333335, ans=0.025 2023-11-28 16:08:57,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3575213.3333333335, ans=0.0 2023-11-28 16:09:10,371 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536300 2023-11-28 16:09:15,011 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7250, loss[loss=0.04592, simple_loss=0.05755, pruned_loss=0.006368, audio_tagging_loss=0.01078, over 14495.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08884, pruned_loss=0.01194, audio_tagging_loss=0.008857, over 3037977.16 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:09:41,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3575480.0, ans=15.0 2023-11-28 16:10:11,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3575613.3333333335, ans=0.1 2023-11-28 16:10:12,498 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536350 2023-11-28 16:10:17,740 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7300, loss[loss=0.06141, simple_loss=0.07767, pruned_loss=0.01167, audio_tagging_loss=0.0109, over 13883.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08897, pruned_loss=0.01196, audio_tagging_loss=0.008787, over 3032933.46 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:10:30,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3575746.6666666665, ans=0.0 2023-11-28 16:10:34,079 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 8.874e+01 9.514e+01 1.009e+02 1.390e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 16:10:37,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3575746.6666666665, ans=0.2 2023-11-28 16:10:43,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3575813.3333333335, ans=0.125 2023-11-28 16:10:46,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3575813.3333333335, ans=0.0 2023-11-28 16:11:09,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3575946.6666666665, ans=0.125 2023-11-28 16:11:13,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3575946.6666666665, ans=0.125 2023-11-28 16:11:14,352 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536400 2023-11-28 16:11:17,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3575946.6666666665, ans=0.125 2023-11-28 16:11:19,294 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7350, loss[loss=0.07195, simple_loss=0.1006, pruned_loss=0.01456, audio_tagging_loss=0.007105, over 15648.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08876, pruned_loss=0.01179, audio_tagging_loss=0.008613, over 3038267.76 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:11:32,688 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.96 vs. limit=12.0 2023-11-28 16:11:46,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3576146.6666666665, ans=0.125 2023-11-28 16:11:48,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3576146.6666666665, ans=0.125 2023-11-28 16:12:00,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3576213.3333333335, ans=0.0 2023-11-28 16:12:03,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3576213.3333333335, ans=0.125 2023-11-28 16:12:06,920 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.71 vs. limit=15.0 2023-11-28 16:12:09,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3576280.0, ans=0.0 2023-11-28 16:12:17,296 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536450 2023-11-28 16:12:22,056 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7400, loss[loss=0.06079, simple_loss=0.08499, pruned_loss=0.008967, audio_tagging_loss=0.009327, over 14919.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08911, pruned_loss=0.01197, audio_tagging_loss=0.00848, over 3034959.65 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:12:29,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3576346.6666666665, ans=0.0 2023-11-28 16:12:41,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3576413.3333333335, ans=0.025 2023-11-28 16:12:41,969 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.528e+01 8.849e+01 9.470e+01 1.002e+02 1.238e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 16:13:19,342 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536500 2023-11-28 16:13:23,941 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7450, loss[loss=0.0656, simple_loss=0.09232, pruned_loss=0.0128, audio_tagging_loss=0.006637, over 15673.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08964, pruned_loss=0.01212, audio_tagging_loss=0.008437, over 3045013.28 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:13:36,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3576746.6666666665, ans=0.05 2023-11-28 16:13:45,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3576746.6666666665, ans=0.125 2023-11-28 16:13:46,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3576746.6666666665, ans=0.0 2023-11-28 16:13:46,669 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.28 vs. limit=10.0 2023-11-28 16:14:08,891 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2023-11-28 16:14:12,160 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.71 vs. limit=10.0 2023-11-28 16:14:21,979 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536550 2023-11-28 16:14:26,693 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7500, loss[loss=0.07216, simple_loss=0.1029, pruned_loss=0.0136, audio_tagging_loss=0.007106, over 16415.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08962, pruned_loss=0.01226, audio_tagging_loss=0.008385, over 3041301.81 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:14:33,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3577013.3333333335, ans=0.1 2023-11-28 16:14:37,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3577013.3333333335, ans=0.125 2023-11-28 16:14:42,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3577080.0, ans=0.0 2023-11-28 16:14:47,285 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.773e+01 8.969e+01 9.602e+01 1.048e+02 1.798e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 16:14:47,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3577080.0, ans=0.125 2023-11-28 16:14:47,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3577080.0, ans=0.125 2023-11-28 16:14:50,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3577080.0, ans=0.2 2023-11-28 16:15:24,947 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536600 2023-11-28 16:15:27,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3577280.0, ans=0.1 2023-11-28 16:15:29,954 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7550, loss[loss=0.07206, simple_loss=0.09799, pruned_loss=0.01476, audio_tagging_loss=0.008305, over 14863.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08944, pruned_loss=0.01224, audio_tagging_loss=0.008498, over 3038385.74 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:15:32,529 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2023-11-28 16:16:00,651 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2023-11-28 16:16:08,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3577546.6666666665, ans=0.2 2023-11-28 16:16:27,847 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536650 2023-11-28 16:16:29,389 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.86 vs. limit=15.0 2023-11-28 16:16:32,326 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7600, loss[loss=0.05864, simple_loss=0.08544, pruned_loss=0.007893, audio_tagging_loss=0.008028, over 15925.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08932, pruned_loss=0.01222, audio_tagging_loss=0.008552, over 3045330.64 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:16:32,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3577680.0, ans=0.2 2023-11-28 16:16:39,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3577680.0, ans=0.125 2023-11-28 16:16:51,860 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.772e+01 9.466e+01 1.004e+02 1.199e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 16:16:52,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3577746.6666666665, ans=0.125 2023-11-28 16:16:53,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3577746.6666666665, ans=0.0 2023-11-28 16:17:02,680 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.73 vs. limit=15.0 2023-11-28 16:17:09,498 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.64 vs. limit=12.0 2023-11-28 16:17:30,320 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536700 2023-11-28 16:17:34,911 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7650, loss[loss=0.0586, simple_loss=0.0809, pruned_loss=0.01025, audio_tagging_loss=0.007906, over 15340.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08912, pruned_loss=0.0122, audio_tagging_loss=0.008628, over 3035488.29 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:18:04,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=3578146.6666666665, ans=0.2 2023-11-28 16:18:06,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=3578146.6666666665, ans=0.1 2023-11-28 16:18:14,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3578213.3333333335, ans=0.125 2023-11-28 16:18:32,213 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536750 2023-11-28 16:18:37,477 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7700, loss[loss=0.06008, simple_loss=0.08016, pruned_loss=0.009459, audio_tagging_loss=0.01054, over 15452.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08883, pruned_loss=0.01219, audio_tagging_loss=0.008687, over 3034757.71 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:18:42,799 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.40 vs. limit=15.0 2023-11-28 16:18:43,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3578346.6666666665, ans=0.125 2023-11-28 16:18:54,795 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.53 vs. limit=15.0 2023-11-28 16:18:57,460 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.127e+01 9.045e+01 9.681e+01 1.026e+02 1.409e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-28 16:19:33,914 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536800 2023-11-28 16:19:34,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3578613.3333333335, ans=0.2 2023-11-28 16:19:39,991 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7750, loss[loss=0.09039, simple_loss=0.1239, pruned_loss=0.0208, audio_tagging_loss=0.007639, over 16057.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08927, pruned_loss=0.01226, audio_tagging_loss=0.008653, over 3033792.00 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:19:48,433 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.89 vs. limit=15.0 2023-11-28 16:19:51,868 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2023-11-28 16:20:03,925 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2023-11-28 16:20:11,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3578813.3333333335, ans=0.1 2023-11-28 16:20:35,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3578946.6666666665, ans=0.95 2023-11-28 16:20:37,919 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536850 2023-11-28 16:20:42,585 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7800, loss[loss=0.0756, simple_loss=0.09929, pruned_loss=0.01701, audio_tagging_loss=0.008946, over 15395.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08898, pruned_loss=0.01224, audio_tagging_loss=0.008645, over 3033540.24 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:20:57,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3579080.0, ans=0.025 2023-11-28 16:21:00,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3579080.0, ans=0.0 2023-11-28 16:21:01,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3579080.0, ans=0.0 2023-11-28 16:21:01,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3579080.0, ans=0.1 2023-11-28 16:21:01,942 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.300e+01 9.025e+01 9.590e+01 1.021e+02 1.203e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 16:21:09,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3579146.6666666665, ans=0.0 2023-11-28 16:21:38,665 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536900 2023-11-28 16:21:38,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3579280.0, ans=0.1 2023-11-28 16:21:44,000 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7850, loss[loss=0.05768, simple_loss=0.07583, pruned_loss=0.01065, audio_tagging_loss=0.009114, over 14921.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08898, pruned_loss=0.01217, audio_tagging_loss=0.008591, over 3039862.28 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:21:46,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3579346.6666666665, ans=0.125 2023-11-28 16:22:21,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3579546.6666666665, ans=0.125 2023-11-28 16:22:34,158 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.16 vs. limit=10.0 2023-11-28 16:22:40,579 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536950 2023-11-28 16:22:45,265 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7900, loss[loss=0.05186, simple_loss=0.07561, pruned_loss=0.004266, audio_tagging_loss=0.009792, over 15754.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08894, pruned_loss=0.01209, audio_tagging_loss=0.008656, over 3048120.66 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:23:05,964 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.056e+01 9.028e+01 9.515e+01 1.023e+02 1.434e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 16:23:12,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3579813.3333333335, ans=0.04949747468305833 2023-11-28 16:23:13,949 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-11-28 16:23:34,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3579946.6666666665, ans=0.125 2023-11-28 16:23:38,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3579946.6666666665, ans=0.2 2023-11-28 16:23:41,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3579946.6666666665, ans=0.125 2023-11-28 16:23:43,819 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537000 2023-11-28 16:23:48,701 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7950, loss[loss=0.05206, simple_loss=0.05914, pruned_loss=0.008834, audio_tagging_loss=0.01366, over 14941.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.0888, pruned_loss=0.01207, audio_tagging_loss=0.008737, over 3043783.79 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:24:07,214 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 16:24:10,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3580080.0, ans=0.2 2023-11-28 16:24:12,366 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.81 vs. limit=15.0 2023-11-28 16:24:16,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3580146.6666666665, ans=0.035 2023-11-28 16:24:23,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3580146.6666666665, ans=0.0 2023-11-28 16:24:23,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3580146.6666666665, ans=0.0 2023-11-28 16:24:38,937 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.34 vs. limit=12.0 2023-11-28 16:24:43,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3580280.0, ans=0.0 2023-11-28 16:24:45,631 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537050 2023-11-28 16:24:47,272 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.58 vs. limit=15.0 2023-11-28 16:24:50,111 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8000, loss[loss=0.0825, simple_loss=0.1192, pruned_loss=0.01706, audio_tagging_loss=0.005819, over 14877.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08849, pruned_loss=0.01205, audio_tagging_loss=0.008864, over 3038887.29 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:24:50,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3580346.6666666665, ans=0.125 2023-11-28 16:24:55,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3580346.6666666665, ans=0.125 2023-11-28 16:25:05,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3580413.3333333335, ans=0.125 2023-11-28 16:25:11,270 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.789e+01 9.588e+01 1.026e+02 1.289e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 16:25:24,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3580480.0, ans=0.2 2023-11-28 16:25:47,522 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537100 2023-11-28 16:25:51,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3580680.0, ans=0.125 2023-11-28 16:25:52,289 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8050, loss[loss=0.06565, simple_loss=0.09125, pruned_loss=0.01234, audio_tagging_loss=0.007685, over 14245.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08844, pruned_loss=0.01203, audio_tagging_loss=0.008875, over 3035291.57 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:25:52,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3580680.0, ans=0.0 2023-11-28 16:26:04,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3580746.6666666665, ans=0.125 2023-11-28 16:26:48,925 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537150 2023-11-28 16:26:54,652 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8100, loss[loss=0.08317, simple_loss=0.1136, pruned_loss=0.01872, audio_tagging_loss=0.007655, over 14691.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08859, pruned_loss=0.01206, audio_tagging_loss=0.008881, over 3038340.02 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:27:02,214 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.74 vs. limit=12.0 2023-11-28 16:27:03,318 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.84 vs. limit=22.5 2023-11-28 16:27:06,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3581080.0, ans=0.125 2023-11-28 16:27:16,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3581080.0, ans=0.0 2023-11-28 16:27:18,042 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.231e+01 8.911e+01 9.512e+01 1.026e+02 1.565e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 16:27:21,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3581146.6666666665, ans=0.0 2023-11-28 16:27:54,007 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537200 2023-11-28 16:27:56,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3581280.0, ans=0.125 2023-11-28 16:27:58,917 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8150, loss[loss=0.07002, simple_loss=0.1052, pruned_loss=0.009446, audio_tagging_loss=0.007958, over 15995.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.0886, pruned_loss=0.01204, audio_tagging_loss=0.008779, over 3045501.62 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:28:36,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3581546.6666666665, ans=0.125 2023-11-28 16:28:49,370 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-11-28 16:28:56,599 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537250 2023-11-28 16:29:01,142 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8200, loss[loss=0.05924, simple_loss=0.08149, pruned_loss=0.01151, audio_tagging_loss=0.006985, over 14358.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08879, pruned_loss=0.01206, audio_tagging_loss=0.008616, over 3042387.59 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:29:05,740 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 16:29:23,387 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 9.010e+01 9.552e+01 1.037e+02 1.390e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 16:29:24,777 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2023-11-28 16:29:37,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3581880.0, ans=0.0 2023-11-28 16:29:58,260 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537300 2023-11-28 16:30:02,806 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8250, loss[loss=0.06867, simple_loss=0.08591, pruned_loss=0.01353, audio_tagging_loss=0.01218, over 15131.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08869, pruned_loss=0.012, audio_tagging_loss=0.008631, over 3038820.04 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:30:10,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3582013.3333333335, ans=0.0 2023-11-28 16:30:10,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3582013.3333333335, ans=0.125 2023-11-28 16:30:24,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3582080.0, ans=0.125 2023-11-28 16:30:24,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.15 vs. limit=15.0 2023-11-28 16:30:36,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3582146.6666666665, ans=0.0 2023-11-28 16:30:52,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3582280.0, ans=0.95 2023-11-28 16:30:56,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3582280.0, ans=0.0 2023-11-28 16:31:00,873 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537350 2023-11-28 16:31:06,270 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8300, loss[loss=0.0556, simple_loss=0.07363, pruned_loss=0.01031, audio_tagging_loss=0.008471, over 15480.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08891, pruned_loss=0.01192, audio_tagging_loss=0.008567, over 3043848.02 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:31:17,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3582413.3333333335, ans=0.125 2023-11-28 16:31:19,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3582413.3333333335, ans=0.125 2023-11-28 16:31:28,112 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.613e+01 8.841e+01 9.438e+01 1.013e+02 1.279e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 16:31:36,558 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.97 vs. limit=15.0 2023-11-28 16:31:48,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3582546.6666666665, ans=0.1 2023-11-28 16:31:49,292 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.60 vs. limit=15.0 2023-11-28 16:31:56,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3582613.3333333335, ans=0.1 2023-11-28 16:32:03,997 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537400 2023-11-28 16:32:05,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3582613.3333333335, ans=0.125 2023-11-28 16:32:08,972 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8350, loss[loss=0.06998, simple_loss=0.08995, pruned_loss=0.01287, audio_tagging_loss=0.01213, over 16327.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08907, pruned_loss=0.012, audio_tagging_loss=0.008569, over 3048368.29 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:32:15,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3582680.0, ans=0.07 2023-11-28 16:32:19,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3582680.0, ans=0.5 2023-11-28 16:32:20,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3582746.6666666665, ans=0.0 2023-11-28 16:32:33,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3582813.3333333335, ans=10.0 2023-11-28 16:32:34,536 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=22.5 2023-11-28 16:32:50,412 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.94 vs. limit=15.0 2023-11-28 16:32:51,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3582880.0, ans=0.0 2023-11-28 16:32:52,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.86 vs. limit=15.0 2023-11-28 16:32:59,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3582946.6666666665, ans=0.0 2023-11-28 16:33:06,824 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537450 2023-11-28 16:33:11,433 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8400, loss[loss=0.07018, simple_loss=0.09931, pruned_loss=0.01288, audio_tagging_loss=0.007643, over 16497.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08919, pruned_loss=0.01195, audio_tagging_loss=0.008565, over 3044858.16 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:33:22,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3583080.0, ans=0.09899494936611666 2023-11-28 16:33:34,436 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 8.936e+01 9.656e+01 1.030e+02 3.353e+02, threshold=1.931e+02, percent-clipped=1.0 2023-11-28 16:33:40,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3583146.6666666665, ans=0.125 2023-11-28 16:33:55,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3583213.3333333335, ans=0.0 2023-11-28 16:34:07,996 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2023-11-28 16:34:09,878 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537500 2023-11-28 16:34:14,420 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8450, loss[loss=0.06971, simple_loss=0.103, pruned_loss=0.009961, audio_tagging_loss=0.00823, over 16052.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.0903, pruned_loss=0.01221, audio_tagging_loss=0.008535, over 3052949.41 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:34:36,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.78 vs. limit=15.0 2023-11-28 16:34:39,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3583480.0, ans=0.0 2023-11-28 16:34:41,559 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:35:13,091 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537550 2023-11-28 16:35:13,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3583613.3333333335, ans=0.2 2023-11-28 16:35:17,718 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8500, loss[loss=0.07169, simple_loss=0.1008, pruned_loss=0.01288, audio_tagging_loss=0.008407, over 16194.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08991, pruned_loss=0.01212, audio_tagging_loss=0.008552, over 3050955.78 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:35:40,020 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.912e+01 9.575e+01 1.015e+02 1.303e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 16:35:43,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3583813.3333333335, ans=0.125 2023-11-28 16:35:43,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3583813.3333333335, ans=0.125 2023-11-28 16:35:51,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3583813.3333333335, ans=0.0 2023-11-28 16:35:58,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3583880.0, ans=0.0 2023-11-28 16:36:06,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3583946.6666666665, ans=0.1 2023-11-28 16:36:12,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3583946.6666666665, ans=0.0 2023-11-28 16:36:14,185 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537600 2023-11-28 16:36:19,717 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8550, loss[loss=0.06488, simple_loss=0.08812, pruned_loss=0.01269, audio_tagging_loss=0.008132, over 15629.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08923, pruned_loss=0.01208, audio_tagging_loss=0.008571, over 3043118.76 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:36:48,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3584146.6666666665, ans=0.5 2023-11-28 16:36:52,563 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2023-11-28 16:36:57,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3584213.3333333335, ans=0.125 2023-11-28 16:37:17,075 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537650 2023-11-28 16:37:20,014 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.71 vs. limit=12.0 2023-11-28 16:37:21,646 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8600, loss[loss=0.08254, simple_loss=0.1144, pruned_loss=0.01689, audio_tagging_loss=0.008455, over 16162.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08899, pruned_loss=0.01213, audio_tagging_loss=0.008689, over 3040797.47 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:37:44,161 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.628e+01 8.790e+01 9.576e+01 1.011e+02 1.183e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 16:37:45,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3584480.0, ans=10.0 2023-11-28 16:37:49,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3584480.0, ans=0.2 2023-11-28 16:37:51,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3584480.0, ans=0.125 2023-11-28 16:38:14,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3584613.3333333335, ans=0.0 2023-11-28 16:38:18,645 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537700 2023-11-28 16:38:23,787 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8650, loss[loss=0.07763, simple_loss=0.1048, pruned_loss=0.01405, audio_tagging_loss=0.01119, over 15561.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09011, pruned_loss=0.01224, audio_tagging_loss=0.008752, over 3040013.33 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:39:02,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3584880.0, ans=0.0 2023-11-28 16:39:20,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3584946.6666666665, ans=0.0 2023-11-28 16:39:21,483 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537750 2023-11-28 16:39:26,609 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8700, loss[loss=0.05133, simple_loss=0.06981, pruned_loss=0.009518, audio_tagging_loss=0.006905, over 15237.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09113, pruned_loss=0.01237, audio_tagging_loss=0.008726, over 3036116.28 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:39:39,619 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.99 vs. limit=10.0 2023-11-28 16:39:48,561 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.457e+01 9.145e+01 9.850e+01 1.054e+02 1.476e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-28 16:40:13,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3585213.3333333335, ans=0.1 2023-11-28 16:40:24,572 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537800 2023-11-28 16:40:28,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3585346.6666666665, ans=0.125 2023-11-28 16:40:29,361 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8750, loss[loss=0.05924, simple_loss=0.07243, pruned_loss=0.0145, audio_tagging_loss=0.008524, over 14528.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.0904, pruned_loss=0.01226, audio_tagging_loss=0.008776, over 3035019.36 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:40:32,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3585346.6666666665, ans=0.1 2023-11-28 16:40:34,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3585346.6666666665, ans=0.125 2023-11-28 16:40:54,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3585480.0, ans=0.04949747468305833 2023-11-28 16:41:02,713 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.66 vs. limit=15.0 2023-11-28 16:41:12,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3585546.6666666665, ans=0.5 2023-11-28 16:41:26,412 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537850 2023-11-28 16:41:31,154 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8800, loss[loss=0.06162, simple_loss=0.08966, pruned_loss=0.009731, audio_tagging_loss=0.007062, over 15450.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09042, pruned_loss=0.01236, audio_tagging_loss=0.00883, over 3031621.47 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:41:35,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3585680.0, ans=0.0 2023-11-28 16:41:54,340 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 9.010e+01 9.598e+01 1.030e+02 1.176e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 16:42:11,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3585880.0, ans=0.125 2023-11-28 16:42:28,851 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537900 2023-11-28 16:42:29,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3585946.6666666665, ans=0.125 2023-11-28 16:42:34,101 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8850, loss[loss=0.07651, simple_loss=0.1055, pruned_loss=0.01561, audio_tagging_loss=0.008133, over 15912.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08985, pruned_loss=0.01216, audio_tagging_loss=0.008956, over 3032147.58 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:42:39,050 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2023-11-28 16:42:50,879 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 16:43:02,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3586146.6666666665, ans=0.0 2023-11-28 16:43:25,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3586280.0, ans=0.125 2023-11-28 16:43:28,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3586280.0, ans=0.125 2023-11-28 16:43:31,367 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537950 2023-11-28 16:43:36,690 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8900, loss[loss=0.0622, simple_loss=0.08858, pruned_loss=0.009334, audio_tagging_loss=0.00857, over 16495.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09118, pruned_loss=0.01244, audio_tagging_loss=0.008724, over 3040458.50 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:43:45,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3586346.6666666665, ans=0.125 2023-11-28 16:43:59,146 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 9.009e+01 9.604e+01 1.041e+02 1.260e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 16:44:08,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3586480.0, ans=0.125 2023-11-28 16:44:10,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3586480.0, ans=0.0 2023-11-28 16:44:19,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3586546.6666666665, ans=0.0 2023-11-28 16:44:21,076 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2023-11-28 16:44:33,977 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538000 2023-11-28 16:44:34,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3586613.3333333335, ans=0.125 2023-11-28 16:44:39,016 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8950, loss[loss=0.06631, simple_loss=0.08981, pruned_loss=0.01303, audio_tagging_loss=0.00837, over 15051.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09024, pruned_loss=0.01234, audio_tagging_loss=0.008702, over 3046352.10 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:44:40,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3586680.0, ans=0.1 2023-11-28 16:44:43,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3586680.0, ans=0.125 2023-11-28 16:44:44,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3586680.0, ans=0.1 2023-11-28 16:44:54,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3586746.6666666665, ans=0.125 2023-11-28 16:45:27,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-11-28 16:45:31,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3586946.6666666665, ans=0.0 2023-11-28 16:45:37,251 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538050 2023-11-28 16:45:41,946 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9000, loss[loss=0.07465, simple_loss=0.1018, pruned_loss=0.01475, audio_tagging_loss=0.009008, over 16655.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08977, pruned_loss=0.01225, audio_tagging_loss=0.008664, over 3050752.18 frames. ], batch size: 63, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:45:41,950 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 16:46:23,745 INFO [train_asr.py:1267] (0/4) Epoch 45, validation: loss=0.05837, simple_loss=0.05051, pruned_loss=0.005241, audio_tagging_loss=0.02788, over 4681554.00 frames. 2023-11-28 16:46:23,745 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 16:46:35,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3587080.0, ans=0.125 2023-11-28 16:46:37,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3587080.0, ans=0.125 2023-11-28 16:46:42,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3587080.0, ans=0.125 2023-11-28 16:46:46,757 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.988e+01 9.549e+01 1.029e+02 1.340e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 16:46:56,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3587146.6666666665, ans=0.125 2023-11-28 16:47:00,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3587213.3333333335, ans=0.125 2023-11-28 16:47:21,006 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538100 2023-11-28 16:47:21,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3587280.0, ans=0.0 2023-11-28 16:47:26,288 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9050, loss[loss=0.06806, simple_loss=0.09653, pruned_loss=0.01191, audio_tagging_loss=0.007881, over 16729.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09027, pruned_loss=0.01231, audio_tagging_loss=0.00853, over 3059446.35 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:47:26,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3587346.6666666665, ans=0.2 2023-11-28 16:47:31,735 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.28 vs. limit=15.0 2023-11-28 16:47:35,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.90 vs. limit=15.0 2023-11-28 16:47:54,376 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.79 vs. limit=15.0 2023-11-28 16:47:58,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3587480.0, ans=0.0 2023-11-28 16:48:02,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3587546.6666666665, ans=0.125 2023-11-28 16:48:23,512 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538150 2023-11-28 16:48:26,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.75 vs. limit=10.0 2023-11-28 16:48:28,178 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9100, loss[loss=0.06676, simple_loss=0.08537, pruned_loss=0.01613, audio_tagging_loss=0.00795, over 15361.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08904, pruned_loss=0.01205, audio_tagging_loss=0.008567, over 3057251.97 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:48:29,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3587680.0, ans=0.0 2023-11-28 16:48:36,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3587680.0, ans=0.125 2023-11-28 16:48:43,050 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.59 vs. limit=15.0 2023-11-28 16:48:43,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3587746.6666666665, ans=0.125 2023-11-28 16:48:52,573 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.776e+01 9.450e+01 1.014e+02 1.425e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 16:48:57,928 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.23 vs. limit=22.5 2023-11-28 16:49:13,163 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2023-11-28 16:49:26,133 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538200 2023-11-28 16:49:30,980 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9150, loss[loss=0.06376, simple_loss=0.09085, pruned_loss=0.01178, audio_tagging_loss=0.006558, over 15579.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.09009, pruned_loss=0.01221, audio_tagging_loss=0.008457, over 3055857.31 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:49:52,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3588080.0, ans=0.125 2023-11-28 16:49:57,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3588146.6666666665, ans=0.125 2023-11-28 16:50:04,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3588146.6666666665, ans=0.125 2023-11-28 16:50:20,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=3588280.0, ans=0.02 2023-11-28 16:50:28,785 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538250 2023-11-28 16:50:30,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3588280.0, ans=0.125 2023-11-28 16:50:32,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3588346.6666666665, ans=0.125 2023-11-28 16:50:33,287 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9200, loss[loss=0.06496, simple_loss=0.08473, pruned_loss=0.01219, audio_tagging_loss=0.01041, over 13903.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.09017, pruned_loss=0.01217, audio_tagging_loss=0.008534, over 3053049.62 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:50:44,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3588413.3333333335, ans=0.0 2023-11-28 16:50:56,597 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 9.083e+01 9.538e+01 1.018e+02 1.192e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 16:51:29,984 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538300 2023-11-28 16:51:35,096 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9250, loss[loss=0.07544, simple_loss=0.09845, pruned_loss=0.01663, audio_tagging_loss=0.009588, over 15292.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08998, pruned_loss=0.01216, audio_tagging_loss=0.008562, over 3057330.44 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:51:35,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3588680.0, ans=0.0 2023-11-28 16:51:36,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3588680.0, ans=0.125 2023-11-28 16:52:34,311 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538350 2023-11-28 16:52:39,057 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9300, loss[loss=0.0594, simple_loss=0.08017, pruned_loss=0.008205, audio_tagging_loss=0.01111, over 14793.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08883, pruned_loss=0.01193, audio_tagging_loss=0.008618, over 3055743.17 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:53:03,666 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.801e+01 8.822e+01 9.388e+01 1.037e+02 1.623e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 16:53:30,580 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.28 vs. limit=15.0 2023-11-28 16:53:37,235 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538400 2023-11-28 16:53:42,858 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9350, loss[loss=0.077, simple_loss=0.1084, pruned_loss=0.01468, audio_tagging_loss=0.008109, over 15687.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08882, pruned_loss=0.0119, audio_tagging_loss=0.008571, over 3057450.19 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:53:44,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3589346.6666666665, ans=0.0 2023-11-28 16:53:48,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3589346.6666666665, ans=0.125 2023-11-28 16:53:50,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3589346.6666666665, ans=0.2 2023-11-28 16:53:58,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3589413.3333333335, ans=0.1 2023-11-28 16:54:02,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3589413.3333333335, ans=0.1 2023-11-28 16:54:07,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.30 vs. limit=22.5 2023-11-28 16:54:24,779 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2023-11-28 16:54:39,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3589613.3333333335, ans=0.125 2023-11-28 16:54:40,532 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538450 2023-11-28 16:54:45,216 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9400, loss[loss=0.07274, simple_loss=0.1088, pruned_loss=0.01201, audio_tagging_loss=0.006307, over 15625.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08856, pruned_loss=0.01188, audio_tagging_loss=0.008659, over 3058729.47 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:54:51,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3589680.0, ans=0.0 2023-11-28 16:54:53,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.04 vs. limit=12.0 2023-11-28 16:55:11,217 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.693e+01 8.781e+01 9.524e+01 1.025e+02 1.257e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-28 16:55:17,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3589813.3333333335, ans=0.125 2023-11-28 16:55:42,569 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538500 2023-11-28 16:55:47,197 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9450, loss[loss=0.05681, simple_loss=0.07727, pruned_loss=0.00955, audio_tagging_loss=0.008624, over 15305.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08811, pruned_loss=0.01177, audio_tagging_loss=0.008829, over 3056756.36 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:55:49,555 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 16:56:18,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3590146.6666666665, ans=0.0 2023-11-28 16:56:29,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3590213.3333333335, ans=0.125 2023-11-28 16:56:45,200 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538550 2023-11-28 16:56:49,800 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9500, loss[loss=0.05429, simple_loss=0.07229, pruned_loss=0.008109, audio_tagging_loss=0.01003, over 14887.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08819, pruned_loss=0.01172, audio_tagging_loss=0.008861, over 3057251.93 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:56:52,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3590346.6666666665, ans=0.125 2023-11-28 16:57:11,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2023-11-28 16:57:16,067 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.050e+01 9.072e+01 9.708e+01 1.059e+02 2.012e+02, threshold=1.942e+02, percent-clipped=1.0 2023-11-28 16:57:28,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3590546.6666666665, ans=0.125 2023-11-28 16:57:37,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3590546.6666666665, ans=0.0 2023-11-28 16:57:47,687 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538600 2023-11-28 16:57:52,598 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9550, loss[loss=0.05539, simple_loss=0.07595, pruned_loss=0.006469, audio_tagging_loss=0.01095, over 14357.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08755, pruned_loss=0.01178, audio_tagging_loss=0.008907, over 3051034.36 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:57:56,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3590680.0, ans=0.125 2023-11-28 16:58:05,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3590746.6666666665, ans=10.0 2023-11-28 16:58:08,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3590746.6666666665, ans=0.125 2023-11-28 16:58:12,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3590746.6666666665, ans=0.125 2023-11-28 16:58:16,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3590813.3333333335, ans=0.125 2023-11-28 16:58:24,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3590813.3333333335, ans=0.0 2023-11-28 16:58:38,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3590880.0, ans=0.125 2023-11-28 16:58:50,220 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538650 2023-11-28 16:58:55,037 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9600, loss[loss=0.06713, simple_loss=0.0963, pruned_loss=0.01064, audio_tagging_loss=0.008341, over 15720.00 frames. ], tot_loss[loss=0.06426, simple_loss=0.08725, pruned_loss=0.01169, audio_tagging_loss=0.008936, over 3039527.46 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:58:55,850 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.72 vs. limit=15.0 2023-11-28 16:58:55,884 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2023-11-28 16:59:21,718 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.897e+01 9.047e+01 9.589e+01 1.034e+02 1.302e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 16:59:30,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3591146.6666666665, ans=0.0 2023-11-28 16:59:53,245 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538700 2023-11-28 16:59:53,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3591280.0, ans=0.04949747468305833 2023-11-28 16:59:56,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3591280.0, ans=0.125 2023-11-28 16:59:57,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3591346.6666666665, ans=0.0 2023-11-28 16:59:57,917 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9650, loss[loss=0.05226, simple_loss=0.06521, pruned_loss=0.007025, audio_tagging_loss=0.01263, over 14116.00 frames. ], tot_loss[loss=0.06436, simple_loss=0.08733, pruned_loss=0.01176, audio_tagging_loss=0.008936, over 3034187.97 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 17:00:09,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3591413.3333333335, ans=0.125 2023-11-28 17:00:22,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3591480.0, ans=0.2 2023-11-28 17:00:51,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3591613.3333333335, ans=0.0 2023-11-28 17:00:53,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3591613.3333333335, ans=0.125 2023-11-28 17:00:54,588 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538750 2023-11-28 17:00:59,819 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9700, loss[loss=0.05293, simple_loss=0.07647, pruned_loss=0.006611, audio_tagging_loss=0.008089, over 15492.00 frames. ], tot_loss[loss=0.06411, simple_loss=0.08728, pruned_loss=0.01169, audio_tagging_loss=0.008782, over 3036227.13 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 17:01:04,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3591680.0, ans=0.2 2023-11-28 17:01:08,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3591680.0, ans=0.1 2023-11-28 17:01:10,603 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.76 vs. limit=15.0 2023-11-28 17:01:22,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3591746.6666666665, ans=0.2 2023-11-28 17:01:26,474 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.891e+01 8.979e+01 9.434e+01 1.003e+02 1.570e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 17:01:26,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3591813.3333333335, ans=0.0 2023-11-28 17:01:49,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3591946.6666666665, ans=0.125 2023-11-28 17:01:55,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3591946.6666666665, ans=0.125 2023-11-28 17:01:57,471 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538800 2023-11-28 17:01:59,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3591946.6666666665, ans=0.0 2023-11-28 17:02:02,765 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9750, loss[loss=0.06611, simple_loss=0.09741, pruned_loss=0.008811, audio_tagging_loss=0.0086, over 15434.00 frames. ], tot_loss[loss=0.06376, simple_loss=0.08711, pruned_loss=0.01149, audio_tagging_loss=0.008721, over 3045100.06 frames. ], batch size: 54, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:02:05,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3592013.3333333335, ans=0.125 2023-11-28 17:02:06,910 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.15 vs. limit=15.0 2023-11-28 17:02:15,643 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-11-28 17:02:27,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3592146.6666666665, ans=0.07 2023-11-28 17:02:27,604 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=12.0 2023-11-28 17:02:32,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3592146.6666666665, ans=0.125 2023-11-28 17:02:36,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3592146.6666666665, ans=0.0 2023-11-28 17:02:39,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3592213.3333333335, ans=0.1 2023-11-28 17:02:59,594 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538850 2023-11-28 17:03:04,996 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9800, loss[loss=0.03968, simple_loss=0.05387, pruned_loss=0.003591, audio_tagging_loss=0.009155, over 15439.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08843, pruned_loss=0.01172, audio_tagging_loss=0.008715, over 3043791.11 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:03:20,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3592413.3333333335, ans=0.125 2023-11-28 17:03:22,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3592413.3333333335, ans=0.0 2023-11-28 17:03:31,262 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.992e+01 9.640e+01 1.016e+02 2.169e+02, threshold=1.928e+02, percent-clipped=1.0 2023-11-28 17:03:47,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3592546.6666666665, ans=0.1 2023-11-28 17:04:02,282 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538900 2023-11-28 17:04:04,700 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:04:07,526 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9850, loss[loss=0.05741, simple_loss=0.07455, pruned_loss=0.0112, audio_tagging_loss=0.008929, over 15491.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08925, pruned_loss=0.0118, audio_tagging_loss=0.00866, over 3049859.03 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:04:26,042 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=12.0 2023-11-28 17:04:27,546 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.13 vs. limit=22.5 2023-11-28 17:04:28,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3592746.6666666665, ans=0.95 2023-11-28 17:04:31,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3592813.3333333335, ans=0.125 2023-11-28 17:04:51,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3592880.0, ans=0.1 2023-11-28 17:05:04,037 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:05:05,082 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538950 2023-11-28 17:05:10,977 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9900, loss[loss=0.07474, simple_loss=0.106, pruned_loss=0.01388, audio_tagging_loss=0.007844, over 14395.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09068, pruned_loss=0.01206, audio_tagging_loss=0.008597, over 3051205.34 frames. ], batch size: 53, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:05:26,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3593080.0, ans=0.2 2023-11-28 17:05:36,807 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.649e+01 9.160e+01 9.894e+01 1.082e+02 1.663e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-28 17:05:47,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3593213.3333333335, ans=0.2 2023-11-28 17:05:52,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3593213.3333333335, ans=0.2 2023-11-28 17:06:08,656 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539000 2023-11-28 17:06:14,249 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9950, loss[loss=0.04608, simple_loss=0.05377, pruned_loss=0.007784, audio_tagging_loss=0.01141, over 14705.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09038, pruned_loss=0.01205, audio_tagging_loss=0.008622, over 3048736.63 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:06:46,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3593480.0, ans=0.125 2023-11-28 17:07:03,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3593613.3333333335, ans=0.5 2023-11-28 17:07:10,882 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539050 2023-11-28 17:07:11,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-11-28 17:07:15,475 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10000, loss[loss=0.06299, simple_loss=0.07767, pruned_loss=0.0148, audio_tagging_loss=0.009354, over 15790.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08936, pruned_loss=0.01183, audio_tagging_loss=0.008633, over 3047973.78 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:07:17,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3593680.0, ans=0.1 2023-11-28 17:07:38,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3593746.6666666665, ans=0.125 2023-11-28 17:07:41,217 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:07:42,051 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.583e+01 8.648e+01 9.149e+01 9.983e+01 1.212e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-28 17:08:00,329 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.87 vs. limit=8.0 2023-11-28 17:08:13,330 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539100 2023-11-28 17:08:18,075 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10050, loss[loss=0.07712, simple_loss=0.1073, pruned_loss=0.01471, audio_tagging_loss=0.008768, over 15195.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.0887, pruned_loss=0.01183, audio_tagging_loss=0.008632, over 3050388.55 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:08:24,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3594013.3333333335, ans=0.0 2023-11-28 17:08:26,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3594013.3333333335, ans=0.0 2023-11-28 17:08:31,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3594080.0, ans=0.0 2023-11-28 17:08:38,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3594080.0, ans=0.0 2023-11-28 17:08:55,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3594213.3333333335, ans=0.025 2023-11-28 17:09:16,374 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539150 2023-11-28 17:09:21,022 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10100, loss[loss=0.05487, simple_loss=0.07496, pruned_loss=0.008987, audio_tagging_loss=0.008401, over 14652.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08904, pruned_loss=0.01188, audio_tagging_loss=0.008704, over 3058101.56 frames. ], batch size: 54, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:09:26,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3594346.6666666665, ans=0.0 2023-11-28 17:09:48,551 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.910e+01 9.610e+01 1.020e+02 1.223e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 17:09:51,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3594480.0, ans=0.035 2023-11-28 17:09:53,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3594480.0, ans=0.1 2023-11-28 17:09:58,116 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.29 vs. limit=15.0 2023-11-28 17:10:16,265 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:10:18,790 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539200 2023-11-28 17:10:23,878 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10150, loss[loss=0.05735, simple_loss=0.08006, pruned_loss=0.009083, audio_tagging_loss=0.008243, over 14239.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08996, pruned_loss=0.01209, audio_tagging_loss=0.008656, over 3056677.45 frames. ], batch size: 54, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:10:42,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3594746.6666666665, ans=0.125 2023-11-28 17:10:44,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3594746.6666666665, ans=0.0 2023-11-28 17:10:53,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3594813.3333333335, ans=0.125 2023-11-28 17:10:57,592 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:11:05,173 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.30 vs. limit=10.0 2023-11-28 17:11:21,534 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539250 2023-11-28 17:11:24,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3594946.6666666665, ans=0.07 2023-11-28 17:11:26,362 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10200, loss[loss=0.05356, simple_loss=0.07292, pruned_loss=0.007076, audio_tagging_loss=0.01003, over 14988.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.0902, pruned_loss=0.01203, audio_tagging_loss=0.008674, over 3064050.79 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:11:28,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3595013.3333333335, ans=0.2 2023-11-28 17:11:51,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3595146.6666666665, ans=0.0 2023-11-28 17:11:53,845 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.43 vs. limit=15.0 2023-11-28 17:11:54,332 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.974e+01 9.604e+01 1.042e+02 1.393e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 17:11:54,421 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:12:24,077 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539300 2023-11-28 17:12:28,734 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10250, loss[loss=0.06327, simple_loss=0.08593, pruned_loss=0.01216, audio_tagging_loss=0.008152, over 16005.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08953, pruned_loss=0.01202, audio_tagging_loss=0.008716, over 3057724.50 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:12:38,975 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.62 vs. limit=10.0 2023-11-28 17:12:39,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3595346.6666666665, ans=0.07 2023-11-28 17:12:43,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3595413.3333333335, ans=0.125 2023-11-28 17:12:43,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3595413.3333333335, ans=0.125 2023-11-28 17:13:01,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3595480.0, ans=0.0 2023-11-28 17:13:02,223 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.85 vs. limit=22.5 2023-11-28 17:13:27,176 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539350 2023-11-28 17:13:27,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3595613.3333333335, ans=0.125 2023-11-28 17:13:31,884 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10300, loss[loss=0.06288, simple_loss=0.08538, pruned_loss=0.01123, audio_tagging_loss=0.008961, over 14873.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08907, pruned_loss=0.012, audio_tagging_loss=0.008708, over 3060762.77 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:13:32,185 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:13:35,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3595680.0, ans=0.125 2023-11-28 17:13:43,673 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.27 vs. limit=22.5 2023-11-28 17:13:49,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3595746.6666666665, ans=0.125 2023-11-28 17:13:53,169 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=22.5 2023-11-28 17:13:59,043 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.726e+01 9.050e+01 9.766e+01 1.043e+02 1.224e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-28 17:14:23,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3595946.6666666665, ans=0.0 2023-11-28 17:14:24,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=22.5 2023-11-28 17:14:28,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3595946.6666666665, ans=0.2 2023-11-28 17:14:29,267 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539400 2023-11-28 17:14:34,235 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10350, loss[loss=0.05026, simple_loss=0.06574, pruned_loss=0.008071, audio_tagging_loss=0.009324, over 14871.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08977, pruned_loss=0.01222, audio_tagging_loss=0.008697, over 3055937.36 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:14:44,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3596013.3333333335, ans=0.5 2023-11-28 17:15:00,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3596146.6666666665, ans=0.1 2023-11-28 17:15:05,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3596146.6666666665, ans=0.125 2023-11-28 17:15:15,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3596213.3333333335, ans=0.125 2023-11-28 17:15:22,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3596280.0, ans=0.125 2023-11-28 17:15:29,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3596280.0, ans=0.125 2023-11-28 17:15:30,102 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539450 2023-11-28 17:15:34,658 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10400, loss[loss=0.05837, simple_loss=0.07555, pruned_loss=0.01098, audio_tagging_loss=0.00961, over 14218.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08914, pruned_loss=0.0121, audio_tagging_loss=0.008828, over 3051792.39 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:15:52,424 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.11 vs. limit=22.5 2023-11-28 17:16:01,194 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.937e+01 9.027e+01 9.708e+01 1.021e+02 1.407e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 17:16:32,325 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539500 2023-11-28 17:16:36,757 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10450, loss[loss=0.05962, simple_loss=0.07622, pruned_loss=0.01045, audio_tagging_loss=0.01107, over 15450.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08867, pruned_loss=0.01202, audio_tagging_loss=0.008787, over 3050417.61 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:16:37,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3596680.0, ans=0.2 2023-11-28 17:17:08,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3596813.3333333335, ans=0.2 2023-11-28 17:17:17,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3596880.0, ans=10.0 2023-11-28 17:17:30,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3596946.6666666665, ans=0.1 2023-11-28 17:17:33,709 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539550 2023-11-28 17:17:38,939 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10500, loss[loss=0.06588, simple_loss=0.08433, pruned_loss=0.0125, audio_tagging_loss=0.01121, over 14052.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08897, pruned_loss=0.01213, audio_tagging_loss=0.008735, over 3044597.41 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:18:00,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3597080.0, ans=0.1 2023-11-28 17:18:03,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3597146.6666666665, ans=0.125 2023-11-28 17:18:06,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3597146.6666666665, ans=0.0 2023-11-28 17:18:06,806 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 8.931e+01 9.605e+01 1.033e+02 2.073e+02, threshold=1.921e+02, percent-clipped=1.0 2023-11-28 17:18:11,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3597146.6666666665, ans=0.2 2023-11-28 17:18:35,977 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539600 2023-11-28 17:18:40,873 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10550, loss[loss=0.06148, simple_loss=0.08266, pruned_loss=0.01155, audio_tagging_loss=0.008593, over 15351.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.089, pruned_loss=0.01214, audio_tagging_loss=0.008627, over 3046406.66 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:18:46,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3597346.6666666665, ans=15.0 2023-11-28 17:18:53,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3597413.3333333335, ans=0.125 2023-11-28 17:18:58,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3597413.3333333335, ans=0.0 2023-11-28 17:19:02,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3597413.3333333335, ans=0.125 2023-11-28 17:19:05,388 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2023-11-28 17:19:12,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3597480.0, ans=0.125 2023-11-28 17:19:37,873 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539650 2023-11-28 17:19:40,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3597613.3333333335, ans=0.0 2023-11-28 17:19:42,589 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10600, loss[loss=0.0673, simple_loss=0.09157, pruned_loss=0.01545, audio_tagging_loss=0.006066, over 15317.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09058, pruned_loss=0.01261, audio_tagging_loss=0.008414, over 3044491.15 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:19:47,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3597680.0, ans=0.125 2023-11-28 17:20:07,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3597813.3333333335, ans=0.05 2023-11-28 17:20:11,857 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.139e+01 8.954e+01 9.589e+01 1.025e+02 1.251e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 17:20:12,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3597813.3333333335, ans=0.1 2023-11-28 17:20:32,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3597946.6666666665, ans=10.0 2023-11-28 17:20:40,017 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539700 2023-11-28 17:20:45,328 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10650, loss[loss=0.06512, simple_loss=0.09454, pruned_loss=0.01043, audio_tagging_loss=0.007419, over 15275.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08938, pruned_loss=0.01232, audio_tagging_loss=0.008394, over 3044333.71 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:20:50,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3598013.3333333335, ans=0.125 2023-11-28 17:21:17,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=12.0 2023-11-28 17:21:28,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3598213.3333333335, ans=0.125 2023-11-28 17:21:41,974 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539750 2023-11-28 17:21:47,316 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10700, loss[loss=0.054, simple_loss=0.06657, pruned_loss=0.01036, audio_tagging_loss=0.01036, over 14912.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08822, pruned_loss=0.01207, audio_tagging_loss=0.008499, over 3040847.25 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:21:54,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3598346.6666666665, ans=0.0 2023-11-28 17:21:55,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3598346.6666666665, ans=0.2 2023-11-28 17:22:09,589 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2023-11-28 17:22:15,502 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.694e+01 8.739e+01 9.237e+01 1.012e+02 1.304e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-28 17:22:29,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3598546.6666666665, ans=0.1 2023-11-28 17:22:43,829 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539800 2023-11-28 17:22:48,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2023-11-28 17:22:49,066 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10750, loss[loss=0.07123, simple_loss=0.1006, pruned_loss=0.01118, audio_tagging_loss=0.00975, over 16060.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08909, pruned_loss=0.01219, audio_tagging_loss=0.008547, over 3044827.74 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:22:51,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3598680.0, ans=0.1 2023-11-28 17:22:59,387 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.85 vs. limit=15.0 2023-11-28 17:23:08,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3598746.6666666665, ans=0.125 2023-11-28 17:23:28,011 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:23:43,795 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:23:44,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3598946.6666666665, ans=0.95 2023-11-28 17:23:45,969 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539850 2023-11-28 17:23:51,156 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10800, loss[loss=0.05182, simple_loss=0.06707, pruned_loss=0.007483, audio_tagging_loss=0.0108, over 15526.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08882, pruned_loss=0.01204, audio_tagging_loss=0.008596, over 3052717.14 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:24:19,467 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.712e+01 8.985e+01 9.429e+01 1.046e+02 1.643e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 17:24:33,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3599213.3333333335, ans=0.125 2023-11-28 17:24:48,022 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539900 2023-11-28 17:24:53,258 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10850, loss[loss=0.06044, simple_loss=0.08014, pruned_loss=0.009637, audio_tagging_loss=0.01073, over 14664.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08874, pruned_loss=0.01191, audio_tagging_loss=0.008537, over 3057876.71 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:24:58,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3599346.6666666665, ans=0.2 2023-11-28 17:25:08,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.21 vs. limit=12.0 2023-11-28 17:25:09,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3599413.3333333335, ans=0.025 2023-11-28 17:25:11,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3599413.3333333335, ans=0.125 2023-11-28 17:25:11,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3599413.3333333335, ans=0.09899494936611666 2023-11-28 17:25:20,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3599480.0, ans=0.1 2023-11-28 17:25:41,187 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.94 vs. limit=10.0 2023-11-28 17:25:44,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3599613.3333333335, ans=0.125 2023-11-28 17:25:49,847 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539950 2023-11-28 17:25:54,402 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10900, loss[loss=0.05894, simple_loss=0.07021, pruned_loss=0.01449, audio_tagging_loss=0.009345, over 15536.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08812, pruned_loss=0.01193, audio_tagging_loss=0.008652, over 3049507.30 frames. ], batch size: 62, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:25:54,484 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:25:54,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3599680.0, ans=0.1 2023-11-28 17:25:55,372 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.23 vs. limit=5.0 2023-11-28 17:25:58,777 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:26:18,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3599813.3333333335, ans=0.125 2023-11-28 17:26:20,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-11-28 17:26:23,249 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.863e+01 9.525e+01 1.011e+02 1.256e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-28 17:26:32,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3599880.0, ans=0.125 2023-11-28 17:26:51,460 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540000 2023-11-28 17:26:52,865 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-540000.pt 2023-11-28 17:26:58,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3600013.3333333335, ans=0.125 2023-11-28 17:26:58,857 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10950, loss[loss=0.07093, simple_loss=0.09796, pruned_loss=0.01645, audio_tagging_loss=0.005497, over 15635.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08816, pruned_loss=0.0119, audio_tagging_loss=0.008686, over 3050797.11 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:27:07,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3600013.3333333335, ans=0.0 2023-11-28 17:27:27,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3600146.6666666665, ans=0.05 2023-11-28 17:27:28,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3600146.6666666665, ans=0.015 2023-11-28 17:27:32,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.04 vs. limit=22.5 2023-11-28 17:27:37,864 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:27:45,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3600213.3333333335, ans=0.125 2023-11-28 17:27:51,676 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.05 vs. limit=22.5 2023-11-28 17:27:56,607 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540050 2023-11-28 17:27:56,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3600280.0, ans=0.125 2023-11-28 17:28:01,195 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11000, loss[loss=0.04983, simple_loss=0.07026, pruned_loss=0.007521, audio_tagging_loss=0.007176, over 16103.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.089, pruned_loss=0.01199, audio_tagging_loss=0.008689, over 3048838.39 frames. ], batch size: 61, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:28:05,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3600346.6666666665, ans=0.125 2023-11-28 17:28:12,413 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:28:15,654 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:28:20,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3600413.3333333335, ans=0.025 2023-11-28 17:28:28,570 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.36 vs. limit=22.5 2023-11-28 17:28:30,167 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.650e+01 8.952e+01 9.649e+01 1.058e+02 1.351e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 17:28:40,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3600546.6666666665, ans=0.125 2023-11-28 17:28:41,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3600546.6666666665, ans=0.025 2023-11-28 17:28:43,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3600546.6666666665, ans=0.1 2023-11-28 17:28:55,319 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2023-11-28 17:28:58,239 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540100 2023-11-28 17:29:02,656 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11050, loss[loss=0.07439, simple_loss=0.1054, pruned_loss=0.01261, audio_tagging_loss=0.009062, over 15583.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08863, pruned_loss=0.01195, audio_tagging_loss=0.00875, over 3041653.48 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:29:09,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3600680.0, ans=0.125 2023-11-28 17:29:57,909 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.19 vs. limit=15.0 2023-11-28 17:29:59,782 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540150 2023-11-28 17:30:04,335 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11100, loss[loss=0.06594, simple_loss=0.09055, pruned_loss=0.01061, audio_tagging_loss=0.01005, over 15186.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08813, pruned_loss=0.01189, audio_tagging_loss=0.008868, over 3041184.08 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:30:20,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3601080.0, ans=0.125 2023-11-28 17:30:34,451 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.951e+01 8.960e+01 9.547e+01 1.044e+02 1.303e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-28 17:30:48,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3601213.3333333335, ans=0.125 2023-11-28 17:30:57,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3601280.0, ans=0.125 2023-11-28 17:31:01,538 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540200 2023-11-28 17:31:03,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3601280.0, ans=0.125 2023-11-28 17:31:06,502 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11150, loss[loss=0.06416, simple_loss=0.08925, pruned_loss=0.01216, audio_tagging_loss=0.007371, over 14810.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08784, pruned_loss=0.01187, audio_tagging_loss=0.008975, over 3045539.70 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:31:18,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3601413.3333333335, ans=0.125 2023-11-28 17:31:27,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3601413.3333333335, ans=0.0 2023-11-28 17:31:36,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3601480.0, ans=0.0 2023-11-28 17:32:04,036 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540250 2023-11-28 17:32:08,659 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11200, loss[loss=0.04785, simple_loss=0.06775, pruned_loss=0.005888, audio_tagging_loss=0.008081, over 14205.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08798, pruned_loss=0.01182, audio_tagging_loss=0.008946, over 3049425.44 frames. ], batch size: 54, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:32:31,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3601813.3333333335, ans=0.04949747468305833 2023-11-28 17:32:38,200 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 8.987e+01 9.522e+01 1.045e+02 1.448e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 17:32:40,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3601813.3333333335, ans=0.125 2023-11-28 17:32:51,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3601880.0, ans=0.0 2023-11-28 17:33:05,139 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540300 2023-11-28 17:33:09,760 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11250, loss[loss=0.06693, simple_loss=0.09142, pruned_loss=0.0133, audio_tagging_loss=0.007917, over 15610.00 frames. ], tot_loss[loss=0.06387, simple_loss=0.08634, pruned_loss=0.01164, audio_tagging_loss=0.009055, over 3045354.70 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:33:12,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3602013.3333333335, ans=0.125 2023-11-28 17:33:13,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3602013.3333333335, ans=0.125 2023-11-28 17:33:25,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3602080.0, ans=0.125 2023-11-28 17:33:37,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3602146.6666666665, ans=0.125 2023-11-28 17:33:43,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3602146.6666666665, ans=0.0 2023-11-28 17:33:49,673 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.83 vs. limit=15.0 2023-11-28 17:33:50,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3602213.3333333335, ans=0.125 2023-11-28 17:34:07,215 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540350 2023-11-28 17:34:11,704 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11300, loss[loss=0.05691, simple_loss=0.07833, pruned_loss=0.01004, audio_tagging_loss=0.0077, over 16283.00 frames. ], tot_loss[loss=0.06401, simple_loss=0.0871, pruned_loss=0.01162, audio_tagging_loss=0.008842, over 3048831.28 frames. ], batch size: 63, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:34:37,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3602480.0, ans=0.125 2023-11-28 17:34:41,449 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.983e+01 9.542e+01 1.053e+02 1.409e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 17:34:47,748 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.41 vs. limit=15.0 2023-11-28 17:35:03,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3602613.3333333335, ans=0.125 2023-11-28 17:35:05,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3602613.3333333335, ans=0.125 2023-11-28 17:35:08,623 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540400 2023-11-28 17:35:14,226 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11350, loss[loss=0.05075, simple_loss=0.07129, pruned_loss=0.006424, audio_tagging_loss=0.008683, over 14759.00 frames. ], tot_loss[loss=0.06414, simple_loss=0.08749, pruned_loss=0.01167, audio_tagging_loss=0.008726, over 3046677.64 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:35:18,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3602680.0, ans=0.125 2023-11-28 17:35:44,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3602813.3333333335, ans=0.0 2023-11-28 17:35:46,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3602813.3333333335, ans=0.0 2023-11-28 17:35:49,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3602880.0, ans=0.5 2023-11-28 17:35:55,506 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2023-11-28 17:35:59,220 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2023-11-28 17:36:01,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3602880.0, ans=0.125 2023-11-28 17:36:11,254 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540450 2023-11-28 17:36:15,781 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11400, loss[loss=0.05249, simple_loss=0.07047, pruned_loss=0.01047, audio_tagging_loss=0.006787, over 15727.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.0882, pruned_loss=0.01189, audio_tagging_loss=0.008654, over 3057067.01 frames. ], batch size: 61, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:36:21,247 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=15.0 2023-11-28 17:36:26,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3603080.0, ans=0.125 2023-11-28 17:36:29,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3603080.0, ans=0.125 2023-11-28 17:36:47,205 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.957e+01 9.040e+01 9.661e+01 1.043e+02 1.391e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 17:36:47,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3603146.6666666665, ans=0.0 2023-11-28 17:36:55,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3603213.3333333335, ans=0.125 2023-11-28 17:36:58,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3603213.3333333335, ans=0.125 2023-11-28 17:36:58,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3603213.3333333335, ans=0.0 2023-11-28 17:36:59,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3603213.3333333335, ans=0.0 2023-11-28 17:37:12,169 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.60 vs. limit=15.0 2023-11-28 17:37:12,674 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540500 2023-11-28 17:37:18,031 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11450, loss[loss=0.05424, simple_loss=0.07188, pruned_loss=0.01001, audio_tagging_loss=0.008297, over 14579.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.0886, pruned_loss=0.012, audio_tagging_loss=0.008621, over 3046596.04 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:37:34,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3603413.3333333335, ans=0.1 2023-11-28 17:37:41,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3603480.0, ans=0.0 2023-11-28 17:37:47,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3603480.0, ans=0.125 2023-11-28 17:37:52,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3603480.0, ans=0.0 2023-11-28 17:38:02,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.12 vs. limit=10.0 2023-11-28 17:38:08,125 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.80 vs. limit=10.0 2023-11-28 17:38:15,190 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540550 2023-11-28 17:38:19,863 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11500, loss[loss=0.0789, simple_loss=0.1095, pruned_loss=0.01405, audio_tagging_loss=0.0101, over 15323.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08945, pruned_loss=0.01208, audio_tagging_loss=0.008543, over 3045587.01 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:38:27,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3603680.0, ans=0.125 2023-11-28 17:38:29,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3603680.0, ans=0.0 2023-11-28 17:38:42,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3603746.6666666665, ans=0.95 2023-11-28 17:38:42,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3603746.6666666665, ans=0.1 2023-11-28 17:38:50,469 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.613e+01 8.641e+01 9.267e+01 1.033e+02 1.350e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-28 17:38:52,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3603813.3333333335, ans=0.2 2023-11-28 17:38:59,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3603880.0, ans=0.0 2023-11-28 17:39:17,534 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540600 2023-11-28 17:39:22,462 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11550, loss[loss=0.06425, simple_loss=0.08833, pruned_loss=0.01008, audio_tagging_loss=0.009999, over 16017.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08951, pruned_loss=0.01202, audio_tagging_loss=0.008519, over 3045959.33 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:39:22,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3604013.3333333335, ans=0.0 2023-11-28 17:39:55,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3604146.6666666665, ans=0.125 2023-11-28 17:39:57,483 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.16 vs. limit=15.0 2023-11-28 17:39:59,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3604213.3333333335, ans=0.05 2023-11-28 17:40:05,137 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:40:06,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3604213.3333333335, ans=0.1 2023-11-28 17:40:09,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3604213.3333333335, ans=0.125 2023-11-28 17:40:18,807 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540650 2023-11-28 17:40:23,067 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11600, loss[loss=0.059, simple_loss=0.0816, pruned_loss=0.009351, audio_tagging_loss=0.008851, over 15243.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08855, pruned_loss=0.01187, audio_tagging_loss=0.008603, over 3049845.80 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:40:55,427 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.847e+01 9.637e+01 1.017e+02 1.289e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 17:41:04,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.49 vs. limit=15.0 2023-11-28 17:41:07,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3604546.6666666665, ans=0.125 2023-11-28 17:41:11,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3604546.6666666665, ans=0.0 2023-11-28 17:41:17,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3604613.3333333335, ans=0.1 2023-11-28 17:41:21,301 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540700 2023-11-28 17:41:21,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3604613.3333333335, ans=0.125 2023-11-28 17:41:26,703 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11650, loss[loss=0.04999, simple_loss=0.06781, pruned_loss=0.005722, audio_tagging_loss=0.01037, over 14411.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08815, pruned_loss=0.01187, audio_tagging_loss=0.00865, over 3042705.11 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:41:28,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3604680.0, ans=0.07 2023-11-28 17:41:50,624 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.88 vs. limit=15.0 2023-11-28 17:41:56,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3604813.3333333335, ans=0.0 2023-11-28 17:41:59,709 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.60 vs. limit=15.0 2023-11-28 17:42:01,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3604813.3333333335, ans=0.125 2023-11-28 17:42:02,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3604880.0, ans=0.0 2023-11-28 17:42:22,901 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540750 2023-11-28 17:42:28,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2023-11-28 17:42:28,550 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11700, loss[loss=0.06411, simple_loss=0.09778, pruned_loss=0.008587, audio_tagging_loss=0.006636, over 15958.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08906, pruned_loss=0.01203, audio_tagging_loss=0.008597, over 3050067.99 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:42:31,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3605013.3333333335, ans=0.125 2023-11-28 17:42:48,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3605080.0, ans=0.125 2023-11-28 17:42:51,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3605146.6666666665, ans=0.125 2023-11-28 17:42:58,969 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.776e+01 9.057e+01 9.679e+01 1.035e+02 1.386e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-28 17:43:03,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3605213.3333333335, ans=0.0 2023-11-28 17:43:17,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3605280.0, ans=0.0 2023-11-28 17:43:24,756 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540800 2023-11-28 17:43:29,699 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11750, loss[loss=0.08224, simple_loss=0.1167, pruned_loss=0.01869, audio_tagging_loss=0.005192, over 15459.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08883, pruned_loss=0.01205, audio_tagging_loss=0.008588, over 3050179.92 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:44:04,614 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.66 vs. limit=15.0 2023-11-28 17:44:09,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3605546.6666666665, ans=0.0 2023-11-28 17:44:12,769 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=22.5 2023-11-28 17:44:13,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3605546.6666666665, ans=0.125 2023-11-28 17:44:27,128 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540850 2023-11-28 17:44:32,160 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11800, loss[loss=0.07088, simple_loss=0.09354, pruned_loss=0.01787, audio_tagging_loss=0.006236, over 14084.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08854, pruned_loss=0.01201, audio_tagging_loss=0.008637, over 3050450.59 frames. ], batch size: 52, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:44:35,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3605680.0, ans=0.125 2023-11-28 17:44:48,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3605746.6666666665, ans=0.125 2023-11-28 17:44:57,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3605813.3333333335, ans=0.025 2023-11-28 17:45:02,556 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.799e+01 8.699e+01 9.349e+01 9.967e+01 1.294e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-28 17:45:04,530 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2023-11-28 17:45:23,141 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:45:28,757 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540900 2023-11-28 17:45:29,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3605946.6666666665, ans=0.1 2023-11-28 17:45:33,912 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11850, loss[loss=0.0566, simple_loss=0.07764, pruned_loss=0.01071, audio_tagging_loss=0.007063, over 15148.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08912, pruned_loss=0.01223, audio_tagging_loss=0.008672, over 3049701.66 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:45:43,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3606013.3333333335, ans=0.1 2023-11-28 17:46:01,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3606146.6666666665, ans=0.09899494936611666 2023-11-28 17:46:13,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3606213.3333333335, ans=0.1 2023-11-28 17:46:19,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3606213.3333333335, ans=0.125 2023-11-28 17:46:30,881 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540950 2023-11-28 17:46:35,546 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11900, loss[loss=0.07404, simple_loss=0.102, pruned_loss=0.01611, audio_tagging_loss=0.00695, over 16012.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08909, pruned_loss=0.01204, audio_tagging_loss=0.008643, over 3047828.82 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:47:00,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3606480.0, ans=0.125 2023-11-28 17:47:02,257 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:47:06,549 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.830e+01 8.976e+01 9.494e+01 1.024e+02 1.214e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 17:47:16,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.43 vs. limit=15.0 2023-11-28 17:47:33,235 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541000 2023-11-28 17:47:36,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3606613.3333333335, ans=0.0 2023-11-28 17:47:38,213 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11950, loss[loss=0.05963, simple_loss=0.08334, pruned_loss=0.01097, audio_tagging_loss=0.006994, over 15323.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08934, pruned_loss=0.01204, audio_tagging_loss=0.008804, over 3056290.55 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:47:52,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3606746.6666666665, ans=0.025 2023-11-28 17:47:53,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3606746.6666666665, ans=0.07 2023-11-28 17:48:27,237 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:48:33,727 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541050 2023-11-28 17:48:33,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3606946.6666666665, ans=0.0 2023-11-28 17:48:38,212 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 12000, loss[loss=0.05102, simple_loss=0.06102, pruned_loss=0.007112, audio_tagging_loss=0.0134, over 14576.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08896, pruned_loss=0.01206, audio_tagging_loss=0.008985, over 3055163.90 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:48:38,215 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 17:49:16,768 INFO [train_asr.py:1267] (0/4) Epoch 45, validation: loss=0.05759, simple_loss=0.05051, pruned_loss=0.005251, audio_tagging_loss=0.02709, over 4681554.00 frames. 2023-11-28 17:49:16,768 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 17:49:28,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3607080.0, ans=0.125 2023-11-28 17:49:30,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3607080.0, ans=0.125 2023-11-28 17:49:31,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3607080.0, ans=0.125 2023-11-28 17:49:35,615 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.36 vs. limit=22.5 2023-11-28 17:49:47,503 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-45.pt 2023-11-28 17:50:05,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3607186.6666666665, ans=0.0 2023-11-28 17:50:05,320 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2023-11-28 17:50:05,872 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 0, loss[loss=0.05719, simple_loss=0.05756, pruned_loss=0.007228, audio_tagging_loss=0.02118, over 15198.00 frames. ], tot_loss[loss=0.05719, simple_loss=0.05756, pruned_loss=0.007228, audio_tagging_loss=0.02118, over 15198.00 frames. ], batch size: 58, lr: 1.48e-03, grad_scale: 32.0 2023-11-28 17:50:05,874 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 17:50:19,485 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.9575, 5.4538, 5.8227, 5.1367], device='cuda:0') 2023-11-28 17:50:20,574 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.5402, 2.4416, 2.3613, 2.2539, 2.6872, 2.5153, 2.7524, 2.5464], device='cuda:0') 2023-11-28 17:50:42,014 INFO [train_asr.py:1267] (0/4) Epoch 46, validation: loss=0.05787, simple_loss=0.05054, pruned_loss=0.005286, audio_tagging_loss=0.02732, over 4681554.00 frames. 2023-11-28 17:50:42,015 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 17:50:43,140 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.886e+01 9.608e+01 1.034e+02 1.479e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 17:50:44,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3607186.6666666665, ans=0.2 2023-11-28 17:50:54,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3607253.3333333335, ans=0.2 2023-11-28 17:51:06,844 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541100 2023-11-28 17:51:07,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3607320.0, ans=0.2 2023-11-28 17:51:18,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3607386.6666666665, ans=0.1 2023-11-28 17:51:19,487 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:51:43,541 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 50, loss[loss=0.0877, simple_loss=0.1162, pruned_loss=0.01502, audio_tagging_loss=0.01458, over 15141.00 frames. ], tot_loss[loss=0.07367, simple_loss=0.09038, pruned_loss=0.0124, audio_tagging_loss=0.01608, over 693108.53 frames. ], batch size: 54, lr: 1.48e-03, grad_scale: 16.0 2023-11-28 17:51:51,209 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.36 vs. limit=12.0 2023-11-28 17:52:07,745 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541150 2023-11-28 17:52:22,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3607720.0, ans=0.125 2023-11-28 17:52:24,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3607720.0, ans=0.0 2023-11-28 17:52:43,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3607853.3333333335, ans=0.125 2023-11-28 17:52:44,948 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 100, loss[loss=0.08527, simple_loss=0.1071, pruned_loss=0.01852, audio_tagging_loss=0.01322, over 14462.00 frames. ], tot_loss[loss=0.07273, simple_loss=0.08947, pruned_loss=0.01238, audio_tagging_loss=0.01562, over 1213395.51 frames. ], batch size: 53, lr: 1.48e-03, grad_scale: 16.0 2023-11-28 17:52:47,329 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.554e+01 1.000e+02 1.063e+02 1.121e+02 1.597e+02, threshold=2.127e+02, percent-clipped=0.0 2023-11-28 17:53:10,028 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541200 2023-11-28 17:53:22,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3608053.3333333335, ans=0.2 2023-11-28 17:53:29,449 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.99 vs. limit=15.0 2023-11-28 17:53:35,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3608120.0, ans=0.125 2023-11-28 17:53:41,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3608120.0, ans=0.0 2023-11-28 17:53:47,641 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 150, loss[loss=0.07454, simple_loss=0.101, pruned_loss=0.01328, audio_tagging_loss=0.01075, over 14454.00 frames. ], tot_loss[loss=0.07046, simple_loss=0.08836, pruned_loss=0.0121, audio_tagging_loss=0.01418, over 1616607.71 frames. ], batch size: 57, lr: 1.48e-03, grad_scale: 16.0 2023-11-28 17:53:53,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3608186.6666666665, ans=0.125 2023-11-28 17:53:56,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3608186.6666666665, ans=0.125 2023-11-28 17:54:11,800 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541250 2023-11-28 17:54:32,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3608386.6666666665, ans=0.1 2023-11-28 17:54:34,791 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:54:36,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3608453.3333333335, ans=0.125 2023-11-28 17:54:46,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3608453.3333333335, ans=0.125 2023-11-28 17:54:47,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3608453.3333333335, ans=0.125 2023-11-28 17:54:49,312 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 200, loss[loss=0.08146, simple_loss=0.1259, pruned_loss=0.01198, audio_tagging_loss=0.006523, over 15055.00 frames. ], tot_loss[loss=0.06937, simple_loss=0.08906, pruned_loss=0.01215, audio_tagging_loss=0.01269, over 1932562.30 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 17:54:51,572 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.671e+01 9.120e+01 9.843e+01 1.065e+02 1.310e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-28 17:55:13,224 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541300 2023-11-28 17:55:27,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3608720.0, ans=0.125 2023-11-28 17:55:36,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3608720.0, ans=0.125 2023-11-28 17:55:48,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3608786.6666666665, ans=0.1 2023-11-28 17:55:49,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3608853.3333333335, ans=0.125 2023-11-28 17:55:50,916 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 250, loss[loss=0.06168, simple_loss=0.08159, pruned_loss=0.01287, audio_tagging_loss=0.008013, over 15524.00 frames. ], tot_loss[loss=0.06813, simple_loss=0.08913, pruned_loss=0.01212, audio_tagging_loss=0.01145, over 2175482.61 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 17:56:04,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3608920.0, ans=0.125 2023-11-28 17:56:16,252 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541350 2023-11-28 17:56:18,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3608986.6666666665, ans=0.125 2023-11-28 17:56:32,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3609053.3333333335, ans=0.125 2023-11-28 17:56:45,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3609120.0, ans=0.2 2023-11-28 17:56:50,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3609120.0, ans=0.04949747468305833 2023-11-28 17:56:53,097 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 300, loss[loss=0.06759, simple_loss=0.09901, pruned_loss=0.01212, audio_tagging_loss=0.005961, over 16345.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.08863, pruned_loss=0.01188, audio_tagging_loss=0.01064, over 2374124.51 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 17:56:55,364 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.170e+01 9.069e+01 9.733e+01 1.020e+02 1.805e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-28 17:56:55,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3609186.6666666665, ans=0.125 2023-11-28 17:57:05,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3609253.3333333335, ans=0.125 2023-11-28 17:57:17,685 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541400 2023-11-28 17:57:23,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3609320.0, ans=0.125 2023-11-28 17:57:30,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3609386.6666666665, ans=0.125 2023-11-28 17:57:36,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3609386.6666666665, ans=0.125 2023-11-28 17:57:55,487 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 350, loss[loss=0.05938, simple_loss=0.08617, pruned_loss=0.008363, audio_tagging_loss=0.007932, over 15170.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.0897, pruned_loss=0.01191, audio_tagging_loss=0.0101, over 2530118.62 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 17:57:58,563 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.81 vs. limit=15.0 2023-11-28 17:58:01,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3609520.0, ans=0.0 2023-11-28 17:58:05,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3609520.0, ans=0.95 2023-11-28 17:58:09,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3609586.6666666665, ans=0.125 2023-11-28 17:58:16,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3609586.6666666665, ans=0.1 2023-11-28 17:58:19,662 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541450 2023-11-28 17:58:21,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3609653.3333333335, ans=0.0 2023-11-28 17:58:28,026 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=22.5 2023-11-28 17:58:57,267 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 400, loss[loss=0.07909, simple_loss=0.1197, pruned_loss=0.01044, audio_tagging_loss=0.008823, over 15796.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.08938, pruned_loss=0.01177, audio_tagging_loss=0.009815, over 2643160.58 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 17:58:59,624 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.699e+01 9.057e+01 9.604e+01 1.022e+02 1.428e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 17:59:01,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3609853.3333333335, ans=0.125 2023-11-28 17:59:05,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3609853.3333333335, ans=0.1 2023-11-28 17:59:18,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3609920.0, ans=0.0 2023-11-28 17:59:21,507 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541500 2023-11-28 17:59:30,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3609986.6666666665, ans=0.2 2023-11-28 17:59:52,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3610120.0, ans=0.125 2023-11-28 17:59:58,041 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 450, loss[loss=0.04998, simple_loss=0.06086, pruned_loss=0.008807, audio_tagging_loss=0.01074, over 14520.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08878, pruned_loss=0.0118, audio_tagging_loss=0.009609, over 2728294.88 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:00:23,744 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541550 2023-11-28 18:00:51,721 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.70 vs. limit=15.0 2023-11-28 18:01:00,956 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 500, loss[loss=0.05659, simple_loss=0.07161, pruned_loss=0.008597, audio_tagging_loss=0.01219, over 15270.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.0893, pruned_loss=0.01207, audio_tagging_loss=0.009426, over 2798638.29 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:01:04,991 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 8.722e+01 9.408e+01 1.020e+02 1.286e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 18:01:08,748 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:01:11,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3610520.0, ans=0.125 2023-11-28 18:01:12,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.85 vs. limit=15.0 2023-11-28 18:01:25,489 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541600 2023-11-28 18:01:36,204 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2023-11-28 18:02:02,631 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 550, loss[loss=0.0788, simple_loss=0.1108, pruned_loss=0.01457, audio_tagging_loss=0.008834, over 16113.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08924, pruned_loss=0.01222, audio_tagging_loss=0.009219, over 2856412.14 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:02:18,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3610920.0, ans=0.1 2023-11-28 18:02:18,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3610920.0, ans=0.125 2023-11-28 18:02:27,428 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541650 2023-11-28 18:02:42,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3611053.3333333335, ans=0.125 2023-11-28 18:02:59,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3611120.0, ans=0.1 2023-11-28 18:03:04,236 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 600, loss[loss=0.05169, simple_loss=0.07374, pruned_loss=0.005341, audio_tagging_loss=0.009474, over 15186.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.089, pruned_loss=0.01215, audio_tagging_loss=0.00904, over 2899833.90 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:03:07,701 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 9.115e+01 9.737e+01 1.046e+02 1.247e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-28 18:03:21,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3611253.3333333335, ans=0.2 2023-11-28 18:03:22,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3611253.3333333335, ans=0.1 2023-11-28 18:03:29,346 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541700 2023-11-28 18:03:32,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3611320.0, ans=0.1 2023-11-28 18:03:40,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3611386.6666666665, ans=0.0 2023-11-28 18:03:56,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3611453.3333333335, ans=0.125 2023-11-28 18:04:05,985 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 650, loss[loss=0.05974, simple_loss=0.08181, pruned_loss=0.009408, audio_tagging_loss=0.009429, over 15692.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08865, pruned_loss=0.01215, audio_tagging_loss=0.009039, over 2934377.43 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:04:11,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3611520.0, ans=10.0 2023-11-28 18:04:31,695 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541750 2023-11-28 18:04:38,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3611653.3333333335, ans=0.95 2023-11-28 18:04:49,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3611720.0, ans=0.125 2023-11-28 18:04:56,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3611786.6666666665, ans=0.0 2023-11-28 18:05:04,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3611786.6666666665, ans=0.125 2023-11-28 18:05:08,068 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 700, loss[loss=0.08723, simple_loss=0.1132, pruned_loss=0.02089, audio_tagging_loss=0.009714, over 14243.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08877, pruned_loss=0.01205, audio_tagging_loss=0.008987, over 2961903.85 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:05:12,356 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.880e+01 8.893e+01 9.585e+01 1.037e+02 1.398e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 18:05:18,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3611853.3333333335, ans=0.0 2023-11-28 18:05:18,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3611853.3333333335, ans=0.5 2023-11-28 18:05:33,496 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541800 2023-11-28 18:05:45,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3612053.3333333335, ans=0.125 2023-11-28 18:06:11,806 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 750, loss[loss=0.07245, simple_loss=0.09976, pruned_loss=0.01476, audio_tagging_loss=0.007809, over 16183.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08988, pruned_loss=0.01229, audio_tagging_loss=0.00891, over 2989408.54 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:06:14,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3612186.6666666665, ans=0.125 2023-11-28 18:06:25,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3612253.3333333335, ans=0.2 2023-11-28 18:06:36,845 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541850 2023-11-28 18:06:37,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3612320.0, ans=0.125 2023-11-28 18:06:41,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3612320.0, ans=0.1 2023-11-28 18:06:49,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3612386.6666666665, ans=0.125 2023-11-28 18:06:57,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3612386.6666666665, ans=0.125 2023-11-28 18:07:08,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3612453.3333333335, ans=0.0 2023-11-28 18:07:14,130 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 800, loss[loss=0.09002, simple_loss=0.1323, pruned_loss=0.01779, audio_tagging_loss=0.006091, over 15722.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09101, pruned_loss=0.01259, audio_tagging_loss=0.008906, over 3009246.43 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:07:17,635 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 9.017e+01 9.748e+01 1.044e+02 1.462e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-28 18:07:20,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3612520.0, ans=0.0 2023-11-28 18:07:20,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3612520.0, ans=0.125 2023-11-28 18:07:29,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3612586.6666666665, ans=15.0 2023-11-28 18:07:34,837 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.16 vs. limit=15.0 2023-11-28 18:07:39,756 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541900 2023-11-28 18:07:48,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3612653.3333333335, ans=0.0 2023-11-28 18:07:54,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3612720.0, ans=0.125 2023-11-28 18:07:57,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3612720.0, ans=0.125 2023-11-28 18:07:59,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3612720.0, ans=0.125 2023-11-28 18:08:05,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3612786.6666666665, ans=0.0 2023-11-28 18:08:07,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3612786.6666666665, ans=0.125 2023-11-28 18:08:13,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3612786.6666666665, ans=0.0 2023-11-28 18:08:16,484 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 850, loss[loss=0.05434, simple_loss=0.06893, pruned_loss=0.009299, audio_tagging_loss=0.01058, over 14512.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09046, pruned_loss=0.01222, audio_tagging_loss=0.008908, over 3021755.18 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:08:32,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3612920.0, ans=0.1 2023-11-28 18:08:41,277 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541950 2023-11-28 18:09:18,554 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 900, loss[loss=0.05704, simple_loss=0.07514, pruned_loss=0.01062, audio_tagging_loss=0.008851, over 15158.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09063, pruned_loss=0.0122, audio_tagging_loss=0.008892, over 3032675.97 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:09:22,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3613186.6666666665, ans=0.09899494936611666 2023-11-28 18:09:23,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3613186.6666666665, ans=0.0 2023-11-28 18:09:24,241 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.829e+01 8.864e+01 9.446e+01 1.016e+02 1.435e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 18:09:31,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3613253.3333333335, ans=0.125 2023-11-28 18:09:43,218 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542000 2023-11-28 18:09:43,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3613320.0, ans=0.0 2023-11-28 18:09:54,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3613386.6666666665, ans=0.025 2023-11-28 18:09:54,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3613386.6666666665, ans=0.0 2023-11-28 18:10:20,693 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 950, loss[loss=0.04984, simple_loss=0.06598, pruned_loss=0.007112, audio_tagging_loss=0.009733, over 15931.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09101, pruned_loss=0.01226, audio_tagging_loss=0.008848, over 3043956.03 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:10:46,097 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542050 2023-11-28 18:10:58,210 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.92 vs. limit=10.0 2023-11-28 18:11:10,232 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.91 vs. limit=15.0 2023-11-28 18:11:11,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3613786.6666666665, ans=0.125 2023-11-28 18:11:11,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3613786.6666666665, ans=0.0 2023-11-28 18:11:19,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3613786.6666666665, ans=0.2 2023-11-28 18:11:21,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3613853.3333333335, ans=0.125 2023-11-28 18:11:21,217 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=15.0 2023-11-28 18:11:21,883 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1000, loss[loss=0.0726, simple_loss=0.1013, pruned_loss=0.01497, audio_tagging_loss=0.006997, over 15805.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.0905, pruned_loss=0.01225, audio_tagging_loss=0.008709, over 3043365.34 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:11:23,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3613853.3333333335, ans=0.125 2023-11-28 18:11:25,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3613853.3333333335, ans=0.125 2023-11-28 18:11:27,659 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.684e+01 8.919e+01 9.596e+01 1.036e+02 1.232e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-28 18:11:35,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3613920.0, ans=0.0 2023-11-28 18:11:35,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3613920.0, ans=0.125 2023-11-28 18:11:42,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3613920.0, ans=0.125 2023-11-28 18:11:46,669 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542100 2023-11-28 18:11:50,873 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:11:57,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3613986.6666666665, ans=0.125 2023-11-28 18:12:00,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3614053.3333333335, ans=0.0 2023-11-28 18:12:00,714 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:12:13,704 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.85 vs. limit=22.5 2023-11-28 18:12:21,953 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:12:24,608 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1050, loss[loss=0.06711, simple_loss=0.08829, pruned_loss=0.01502, audio_tagging_loss=0.00794, over 14518.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08947, pruned_loss=0.01217, audio_tagging_loss=0.008708, over 3041545.72 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:12:36,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3614253.3333333335, ans=0.2 2023-11-28 18:12:44,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3614253.3333333335, ans=0.125 2023-11-28 18:12:44,720 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.97 vs. limit=10.0 2023-11-28 18:12:49,407 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542150 2023-11-28 18:13:03,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3614386.6666666665, ans=0.0 2023-11-28 18:13:10,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3614386.6666666665, ans=0.07 2023-11-28 18:13:21,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3614453.3333333335, ans=0.035 2023-11-28 18:13:26,583 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1100, loss[loss=0.06558, simple_loss=0.09239, pruned_loss=0.009635, audio_tagging_loss=0.009753, over 14664.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.0887, pruned_loss=0.01203, audio_tagging_loss=0.008693, over 3044690.06 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:13:28,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3614520.0, ans=0.125 2023-11-28 18:13:31,316 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:13:32,352 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.883e+01 8.948e+01 9.564e+01 1.065e+02 1.707e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-28 18:13:49,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3614653.3333333335, ans=0.125 2023-11-28 18:13:50,876 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542200 2023-11-28 18:13:53,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3614653.3333333335, ans=0.125 2023-11-28 18:14:08,971 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:14:13,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3614720.0, ans=0.04949747468305833 2023-11-28 18:14:14,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3614720.0, ans=0.2 2023-11-28 18:14:28,996 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1150, loss[loss=0.05642, simple_loss=0.07719, pruned_loss=0.009062, audio_tagging_loss=0.008769, over 15531.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08894, pruned_loss=0.01204, audio_tagging_loss=0.00864, over 3050744.85 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:14:34,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3614853.3333333335, ans=0.125 2023-11-28 18:14:36,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3614853.3333333335, ans=0.125 2023-11-28 18:14:40,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3614920.0, ans=0.125 2023-11-28 18:14:41,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3614920.0, ans=0.125 2023-11-28 18:14:42,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3614920.0, ans=0.1 2023-11-28 18:14:53,897 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542250 2023-11-28 18:15:06,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3615053.3333333335, ans=0.1 2023-11-28 18:15:15,476 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:15:19,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3615120.0, ans=0.0 2023-11-28 18:15:29,344 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.88 vs. limit=15.0 2023-11-28 18:15:31,093 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1200, loss[loss=0.07033, simple_loss=0.09618, pruned_loss=0.0132, audio_tagging_loss=0.009034, over 14944.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08951, pruned_loss=0.01227, audio_tagging_loss=0.008535, over 3049109.58 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:15:36,981 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.628e+01 8.860e+01 9.476e+01 1.010e+02 2.147e+02, threshold=1.895e+02, percent-clipped=1.0 2023-11-28 18:15:37,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3615186.6666666665, ans=0.0 2023-11-28 18:15:42,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3615253.3333333335, ans=0.125 2023-11-28 18:15:55,756 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542300 2023-11-28 18:16:16,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3615386.6666666665, ans=0.0 2023-11-28 18:16:23,445 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.96 vs. limit=15.0 2023-11-28 18:16:28,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3615453.3333333335, ans=0.1 2023-11-28 18:16:33,383 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1250, loss[loss=0.06209, simple_loss=0.07712, pruned_loss=0.014, audio_tagging_loss=0.009521, over 15637.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08931, pruned_loss=0.01224, audio_tagging_loss=0.008482, over 3048725.50 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:16:44,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3615586.6666666665, ans=0.0 2023-11-28 18:16:57,654 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542350 2023-11-28 18:17:10,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3615720.0, ans=0.1 2023-11-28 18:17:17,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3615720.0, ans=0.125 2023-11-28 18:17:25,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3615786.6666666665, ans=0.125 2023-11-28 18:17:30,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3615786.6666666665, ans=0.125 2023-11-28 18:17:35,311 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1300, loss[loss=0.07607, simple_loss=0.1052, pruned_loss=0.01691, audio_tagging_loss=0.006532, over 16262.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08878, pruned_loss=0.01207, audio_tagging_loss=0.00853, over 3050827.63 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:17:39,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3615853.3333333335, ans=0.125 2023-11-28 18:17:41,170 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.626e+01 8.784e+01 9.305e+01 1.002e+02 1.226e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-28 18:17:59,164 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542400 2023-11-28 18:18:06,249 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:18:08,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3615986.6666666665, ans=0.09899494936611666 2023-11-28 18:18:24,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3616120.0, ans=0.035 2023-11-28 18:18:37,346 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1350, loss[loss=0.09899, simple_loss=0.1309, pruned_loss=0.02528, audio_tagging_loss=0.008267, over 16214.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08828, pruned_loss=0.01205, audio_tagging_loss=0.008577, over 3051780.68 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:18:47,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3616186.6666666665, ans=0.2 2023-11-28 18:19:00,506 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.37 vs. limit=10.0 2023-11-28 18:19:02,247 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542450 2023-11-28 18:19:12,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3616320.0, ans=0.125 2023-11-28 18:19:23,272 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:19:24,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3616386.6666666665, ans=0.0 2023-11-28 18:19:26,231 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.93 vs. limit=15.0 2023-11-28 18:19:38,430 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1400, loss[loss=0.08083, simple_loss=0.1163, pruned_loss=0.01774, audio_tagging_loss=0.004932, over 15399.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08851, pruned_loss=0.01207, audio_tagging_loss=0.008606, over 3052147.24 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:19:42,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3616520.0, ans=0.125 2023-11-28 18:19:45,123 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.015e+01 9.002e+01 9.471e+01 1.001e+02 1.235e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 18:19:45,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3616520.0, ans=0.1 2023-11-28 18:19:56,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3616586.6666666665, ans=0.125 2023-11-28 18:20:03,683 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542500 2023-11-28 18:20:31,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3616786.6666666665, ans=0.125 2023-11-28 18:20:36,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3616786.6666666665, ans=0.0 2023-11-28 18:20:40,620 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1450, loss[loss=0.07883, simple_loss=0.1012, pruned_loss=0.01863, audio_tagging_loss=0.009577, over 14414.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08868, pruned_loss=0.01214, audio_tagging_loss=0.008711, over 3051166.08 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:20:51,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3616853.3333333335, ans=0.1 2023-11-28 18:21:00,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3616920.0, ans=0.0 2023-11-28 18:21:05,418 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542550 2023-11-28 18:21:20,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3617053.3333333335, ans=0.0 2023-11-28 18:21:20,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3617053.3333333335, ans=0.125 2023-11-28 18:21:42,944 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1500, loss[loss=0.04904, simple_loss=0.05215, pruned_loss=0.01235, audio_tagging_loss=0.01062, over 15303.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08905, pruned_loss=0.01224, audio_tagging_loss=0.008769, over 3048848.84 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:21:50,644 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 9.243e+01 1.008e+02 1.066e+02 1.395e+02, threshold=2.017e+02, percent-clipped=0.0 2023-11-28 18:22:07,941 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542600 2023-11-28 18:22:14,013 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.20 vs. limit=12.0 2023-11-28 18:22:45,114 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1550, loss[loss=0.06112, simple_loss=0.08243, pruned_loss=0.00972, audio_tagging_loss=0.01018, over 15665.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08881, pruned_loss=0.01217, audio_tagging_loss=0.008883, over 3049615.97 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:23:07,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3617586.6666666665, ans=0.2 2023-11-28 18:23:10,267 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542650 2023-11-28 18:23:14,479 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.58 vs. limit=15.0 2023-11-28 18:23:19,675 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.34 vs. limit=15.0 2023-11-28 18:23:20,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3617653.3333333335, ans=0.125 2023-11-28 18:23:25,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3617720.0, ans=0.125 2023-11-28 18:23:47,221 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1600, loss[loss=0.06533, simple_loss=0.0852, pruned_loss=0.01426, audio_tagging_loss=0.008472, over 15362.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08984, pruned_loss=0.01231, audio_tagging_loss=0.008954, over 3054253.13 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:23:54,779 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.889e+01 9.133e+01 9.762e+01 1.043e+02 1.262e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-28 18:24:11,911 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542700 2023-11-28 18:24:12,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3617986.6666666665, ans=0.125 2023-11-28 18:24:15,955 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.20 vs. limit=12.0 2023-11-28 18:24:27,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3618053.3333333335, ans=0.0 2023-11-28 18:24:32,644 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.87 vs. limit=15.0 2023-11-28 18:24:34,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3618053.3333333335, ans=0.125 2023-11-28 18:24:45,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3618120.0, ans=10.0 2023-11-28 18:24:47,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3618186.6666666665, ans=0.125 2023-11-28 18:24:48,510 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1650, loss[loss=0.06153, simple_loss=0.09121, pruned_loss=0.008143, audio_tagging_loss=0.007782, over 15888.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09003, pruned_loss=0.01228, audio_tagging_loss=0.008907, over 3048537.40 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:24:48,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3618186.6666666665, ans=0.025 2023-11-28 18:25:00,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3618253.3333333335, ans=0.1 2023-11-28 18:25:13,492 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542750 2023-11-28 18:25:13,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3618320.0, ans=0.1 2023-11-28 18:25:19,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3618320.0, ans=0.0 2023-11-28 18:25:24,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3618386.6666666665, ans=0.125 2023-11-28 18:25:30,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3618386.6666666665, ans=0.125 2023-11-28 18:25:32,525 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2023-11-28 18:25:36,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3618453.3333333335, ans=0.125 2023-11-28 18:25:40,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3618453.3333333335, ans=0.1 2023-11-28 18:25:47,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3618453.3333333335, ans=0.0 2023-11-28 18:25:49,864 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1700, loss[loss=0.06082, simple_loss=0.07983, pruned_loss=0.01291, audio_tagging_loss=0.007991, over 16939.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08981, pruned_loss=0.01219, audio_tagging_loss=0.008923, over 3048968.99 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:25:57,434 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.816e+01 8.880e+01 9.352e+01 1.002e+02 1.354e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-28 18:26:15,676 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542800 2023-11-28 18:26:32,391 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2023-11-28 18:26:52,314 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1750, loss[loss=0.07023, simple_loss=0.09298, pruned_loss=0.01551, audio_tagging_loss=0.008222, over 15191.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.0897, pruned_loss=0.01222, audio_tagging_loss=0.008873, over 3048921.74 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:26:54,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3618853.3333333335, ans=0.125 2023-11-28 18:27:01,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3618853.3333333335, ans=0.0 2023-11-28 18:27:07,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3618920.0, ans=0.2 2023-11-28 18:27:17,823 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542850 2023-11-28 18:27:18,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3618986.6666666665, ans=0.125 2023-11-28 18:27:40,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3619120.0, ans=0.125 2023-11-28 18:27:54,813 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1800, loss[loss=0.0591, simple_loss=0.08291, pruned_loss=0.009495, audio_tagging_loss=0.008155, over 14643.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09046, pruned_loss=0.01227, audio_tagging_loss=0.008703, over 3048190.14 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:28:02,584 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.623e+01 8.817e+01 9.553e+01 1.013e+02 1.527e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 18:28:15,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3619253.3333333335, ans=0.0 2023-11-28 18:28:19,619 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542900 2023-11-28 18:28:19,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3619320.0, ans=0.5 2023-11-28 18:28:56,455 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1850, loss[loss=0.05844, simple_loss=0.07446, pruned_loss=0.01222, audio_tagging_loss=0.008992, over 14723.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09055, pruned_loss=0.01223, audio_tagging_loss=0.008596, over 3052699.56 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:29:05,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3619520.0, ans=0.0 2023-11-28 18:29:10,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3619586.6666666665, ans=0.125 2023-11-28 18:29:14,074 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2023-11-28 18:29:21,021 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542950 2023-11-28 18:29:29,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3619653.3333333335, ans=0.1 2023-11-28 18:29:29,974 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.94 vs. limit=6.0 2023-11-28 18:29:31,265 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.35 vs. limit=15.0 2023-11-28 18:29:38,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3619720.0, ans=0.2 2023-11-28 18:29:43,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3619720.0, ans=0.125 2023-11-28 18:29:58,084 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1900, loss[loss=0.06294, simple_loss=0.08299, pruned_loss=0.0131, audio_tagging_loss=0.008344, over 14964.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08929, pruned_loss=0.01209, audio_tagging_loss=0.00859, over 3051314.06 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:30:06,338 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.360e+01 8.846e+01 9.695e+01 1.030e+02 1.290e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 18:30:08,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=22.5 2023-11-28 18:30:21,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3619920.0, ans=0.05 2023-11-28 18:30:24,951 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543000 2023-11-28 18:30:45,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3620053.3333333335, ans=0.125 2023-11-28 18:31:00,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.54 vs. limit=15.0 2023-11-28 18:31:01,998 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1950, loss[loss=0.06943, simple_loss=0.0919, pruned_loss=0.01495, audio_tagging_loss=0.008536, over 15593.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08946, pruned_loss=0.01214, audio_tagging_loss=0.00857, over 3048434.36 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:31:11,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3620186.6666666665, ans=0.125 2023-11-28 18:31:15,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3620253.3333333335, ans=0.0 2023-11-28 18:31:27,043 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543050 2023-11-28 18:31:36,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3620320.0, ans=10.0 2023-11-28 18:31:49,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3620386.6666666665, ans=0.0 2023-11-28 18:31:51,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3620453.3333333335, ans=0.125 2023-11-28 18:31:57,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3620453.3333333335, ans=0.125 2023-11-28 18:32:02,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.18 vs. limit=12.0 2023-11-28 18:32:05,239 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2000, loss[loss=0.06772, simple_loss=0.08768, pruned_loss=0.01221, audio_tagging_loss=0.01167, over 15970.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08965, pruned_loss=0.01215, audio_tagging_loss=0.008693, over 3050739.76 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:32:11,866 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.67 vs. limit=12.0 2023-11-28 18:32:12,245 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.843e+01 9.517e+01 1.017e+02 1.675e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 18:32:30,301 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543100 2023-11-28 18:32:32,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3620653.3333333335, ans=0.125 2023-11-28 18:32:33,598 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.75 vs. limit=15.0 2023-11-28 18:33:04,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.95 vs. limit=15.0 2023-11-28 18:33:07,943 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2050, loss[loss=0.04993, simple_loss=0.05846, pruned_loss=0.009294, audio_tagging_loss=0.01141, over 15842.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09082, pruned_loss=0.01249, audio_tagging_loss=0.008608, over 3048225.71 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:33:09,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3620853.3333333335, ans=0.125 2023-11-28 18:33:30,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3620920.0, ans=0.0 2023-11-28 18:33:32,726 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543150 2023-11-28 18:34:03,824 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=12.0 2023-11-28 18:34:09,475 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2100, loss[loss=0.0721, simple_loss=0.08586, pruned_loss=0.01252, audio_tagging_loss=0.01665, over 13919.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08964, pruned_loss=0.0123, audio_tagging_loss=0.008577, over 3041321.30 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:34:11,273 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=12.0 2023-11-28 18:34:17,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3621186.6666666665, ans=0.125 2023-11-28 18:34:17,670 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.501e+01 8.878e+01 9.444e+01 1.002e+02 1.258e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 18:34:34,165 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543200 2023-11-28 18:34:37,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3621320.0, ans=0.04949747468305833 2023-11-28 18:34:39,258 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2023-11-28 18:35:06,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3621453.3333333335, ans=0.125 2023-11-28 18:35:12,369 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2150, loss[loss=0.07345, simple_loss=0.1001, pruned_loss=0.01619, audio_tagging_loss=0.00721, over 15794.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08943, pruned_loss=0.01227, audio_tagging_loss=0.008415, over 3035338.87 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:35:17,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3621520.0, ans=0.125 2023-11-28 18:35:27,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3621586.6666666665, ans=0.125 2023-11-28 18:35:36,825 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543250 2023-11-28 18:35:50,189 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:36:10,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3621786.6666666665, ans=0.125 2023-11-28 18:36:14,609 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2200, loss[loss=0.06046, simple_loss=0.08848, pruned_loss=0.01073, audio_tagging_loss=0.005491, over 16686.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08963, pruned_loss=0.01233, audio_tagging_loss=0.00843, over 3040107.76 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:36:20,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3621853.3333333335, ans=0.1 2023-11-28 18:36:21,293 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=22.5 2023-11-28 18:36:22,867 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.392e+01 9.070e+01 9.676e+01 1.027e+02 1.399e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 18:36:38,711 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543300 2023-11-28 18:36:38,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3621986.6666666665, ans=0.0 2023-11-28 18:36:55,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3622053.3333333335, ans=0.125 2023-11-28 18:36:55,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3622053.3333333335, ans=0.1 2023-11-28 18:37:16,447 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2250, loss[loss=0.05037, simple_loss=0.06716, pruned_loss=0.008841, audio_tagging_loss=0.007946, over 14405.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.0891, pruned_loss=0.01216, audio_tagging_loss=0.008466, over 3041670.81 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:37:22,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3622186.6666666665, ans=0.0 2023-11-28 18:37:36,062 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-11-28 18:37:38,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3622253.3333333335, ans=0.125 2023-11-28 18:37:41,266 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543350 2023-11-28 18:37:51,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3622320.0, ans=0.0 2023-11-28 18:37:54,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3622386.6666666665, ans=0.0 2023-11-28 18:37:55,206 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2023-11-28 18:38:01,206 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2023-11-28 18:38:05,326 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.92 vs. limit=10.0 2023-11-28 18:38:15,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3622453.3333333335, ans=0.1 2023-11-28 18:38:17,971 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2300, loss[loss=0.07093, simple_loss=0.09936, pruned_loss=0.01263, audio_tagging_loss=0.008619, over 16862.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08789, pruned_loss=0.01207, audio_tagging_loss=0.008652, over 3039266.33 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:38:26,663 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 8.756e+01 9.268e+01 1.034e+02 1.497e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-28 18:38:27,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3622520.0, ans=10.0 2023-11-28 18:38:35,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3622586.6666666665, ans=0.125 2023-11-28 18:38:39,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3622586.6666666665, ans=0.125 2023-11-28 18:38:41,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3622653.3333333335, ans=0.1 2023-11-28 18:38:42,552 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543400 2023-11-28 18:39:07,390 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.52 vs. limit=5.0 2023-11-28 18:39:09,454 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.03 vs. limit=15.0 2023-11-28 18:39:14,333 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:39:20,173 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2350, loss[loss=0.07849, simple_loss=0.1093, pruned_loss=0.01683, audio_tagging_loss=0.007001, over 15223.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08851, pruned_loss=0.01209, audio_tagging_loss=0.008748, over 3045453.28 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:39:31,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3622920.0, ans=0.125 2023-11-28 18:39:36,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3622920.0, ans=0.0 2023-11-28 18:39:42,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3622920.0, ans=0.0 2023-11-28 18:39:45,340 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543450 2023-11-28 18:39:45,917 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.89 vs. limit=15.0 2023-11-28 18:40:13,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3623120.0, ans=0.0 2023-11-28 18:40:21,996 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2400, loss[loss=0.0829, simple_loss=0.1163, pruned_loss=0.01656, audio_tagging_loss=0.008208, over 14989.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08905, pruned_loss=0.01218, audio_tagging_loss=0.008806, over 3045772.63 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:40:30,766 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.829e+01 8.938e+01 9.455e+01 1.032e+02 1.610e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-28 18:40:32,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3623186.6666666665, ans=0.0 2023-11-28 18:40:34,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3623253.3333333335, ans=0.1 2023-11-28 18:40:35,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3623253.3333333335, ans=0.125 2023-11-28 18:40:41,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3623253.3333333335, ans=0.125 2023-11-28 18:40:43,798 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2023-11-28 18:40:46,801 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543500 2023-11-28 18:40:49,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3623320.0, ans=0.0 2023-11-28 18:41:22,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3623520.0, ans=0.125 2023-11-28 18:41:23,593 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2450, loss[loss=0.05433, simple_loss=0.07368, pruned_loss=0.009975, audio_tagging_loss=0.007517, over 16364.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.0888, pruned_loss=0.0121, audio_tagging_loss=0.008759, over 3039005.30 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:41:32,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3623520.0, ans=0.125 2023-11-28 18:41:48,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3623653.3333333335, ans=0.025 2023-11-28 18:41:49,417 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543550 2023-11-28 18:41:52,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3623653.3333333335, ans=0.0 2023-11-28 18:42:07,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3623720.0, ans=0.0 2023-11-28 18:42:18,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3623786.6666666665, ans=0.125 2023-11-28 18:42:25,833 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2500, loss[loss=0.06655, simple_loss=0.09036, pruned_loss=0.01269, audio_tagging_loss=0.008679, over 14796.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08863, pruned_loss=0.01206, audio_tagging_loss=0.00876, over 3041087.46 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:42:35,309 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.618e+01 8.803e+01 9.255e+01 1.000e+02 1.311e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-28 18:42:51,465 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543600 2023-11-28 18:43:05,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3624053.3333333335, ans=10.0 2023-11-28 18:43:28,641 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2550, loss[loss=0.06344, simple_loss=0.08112, pruned_loss=0.01331, audio_tagging_loss=0.009563, over 15891.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08822, pruned_loss=0.01194, audio_tagging_loss=0.008758, over 3041115.08 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:43:49,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3624253.3333333335, ans=15.0 2023-11-28 18:43:53,730 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543650 2023-11-28 18:43:58,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3624320.0, ans=0.125 2023-11-28 18:44:10,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3624386.6666666665, ans=0.0 2023-11-28 18:44:27,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3624453.3333333335, ans=0.2 2023-11-28 18:44:30,692 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2600, loss[loss=0.05108, simple_loss=0.0763, pruned_loss=0.004976, audio_tagging_loss=0.007949, over 15397.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08858, pruned_loss=0.01184, audio_tagging_loss=0.008501, over 3041348.16 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:44:30,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3624520.0, ans=0.125 2023-11-28 18:44:33,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3624520.0, ans=0.2 2023-11-28 18:44:33,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3624520.0, ans=0.125 2023-11-28 18:44:38,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3624520.0, ans=0.0 2023-11-28 18:44:39,475 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.738e+01 9.385e+01 1.004e+02 1.373e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 18:44:39,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3624520.0, ans=0.1 2023-11-28 18:44:42,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3624586.6666666665, ans=0.2 2023-11-28 18:44:45,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3624586.6666666665, ans=0.125 2023-11-28 18:44:50,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3624586.6666666665, ans=0.2 2023-11-28 18:44:56,250 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543700 2023-11-28 18:45:05,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3624653.3333333335, ans=0.1 2023-11-28 18:45:05,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3624653.3333333335, ans=0.0 2023-11-28 18:45:29,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3624786.6666666665, ans=0.0 2023-11-28 18:45:32,753 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2650, loss[loss=0.06491, simple_loss=0.07919, pruned_loss=0.01331, audio_tagging_loss=0.01201, over 13785.00 frames. ], tot_loss[loss=0.06432, simple_loss=0.08829, pruned_loss=0.01164, audio_tagging_loss=0.008539, over 3037835.75 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:45:38,038 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.49 vs. limit=15.0 2023-11-28 18:45:58,442 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543750 2023-11-28 18:46:06,985 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:46:35,487 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2700, loss[loss=0.06189, simple_loss=0.08585, pruned_loss=0.0111, audio_tagging_loss=0.007865, over 16452.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.0888, pruned_loss=0.01168, audio_tagging_loss=0.008459, over 3042129.33 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:46:44,268 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.729e+01 9.009e+01 9.559e+01 1.022e+02 1.303e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 18:46:47,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3625253.3333333335, ans=0.2 2023-11-28 18:47:00,355 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543800 2023-11-28 18:47:08,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2023-11-28 18:47:35,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3625453.3333333335, ans=0.125 2023-11-28 18:47:37,938 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2750, loss[loss=0.07211, simple_loss=0.1009, pruned_loss=0.01525, audio_tagging_loss=0.006424, over 14641.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08925, pruned_loss=0.0118, audio_tagging_loss=0.008427, over 3043469.70 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:47:38,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3625520.0, ans=0.125 2023-11-28 18:47:44,777 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=15.0 2023-11-28 18:47:52,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3625586.6666666665, ans=0.125 2023-11-28 18:48:02,762 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543850 2023-11-28 18:48:16,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3625720.0, ans=0.0 2023-11-28 18:48:27,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3625786.6666666665, ans=0.2 2023-11-28 18:48:27,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3625786.6666666665, ans=15.0 2023-11-28 18:48:32,370 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:48:32,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3625786.6666666665, ans=0.0 2023-11-28 18:48:38,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3625853.3333333335, ans=0.1 2023-11-28 18:48:39,512 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2800, loss[loss=0.06191, simple_loss=0.09043, pruned_loss=0.0105, audio_tagging_loss=0.00619, over 16599.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08937, pruned_loss=0.01193, audio_tagging_loss=0.008455, over 3039236.81 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:48:49,550 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.943e+01 9.576e+01 1.040e+02 1.629e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 18:48:53,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3625920.0, ans=0.0 2023-11-28 18:49:05,265 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543900 2023-11-28 18:49:09,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3625986.6666666665, ans=0.125 2023-11-28 18:49:14,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3625986.6666666665, ans=0.125 2023-11-28 18:49:35,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3626120.0, ans=0.125 2023-11-28 18:49:41,563 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2850, loss[loss=0.07577, simple_loss=0.1021, pruned_loss=0.01508, audio_tagging_loss=0.009663, over 17026.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08894, pruned_loss=0.01189, audio_tagging_loss=0.008469, over 3038496.36 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:49:41,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3626186.6666666665, ans=0.09899494936611666 2023-11-28 18:50:06,842 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543950 2023-11-28 18:50:10,644 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:50:27,058 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-11-28 18:50:35,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3626453.3333333335, ans=0.0 2023-11-28 18:50:43,930 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2900, loss[loss=0.07665, simple_loss=0.1059, pruned_loss=0.01541, audio_tagging_loss=0.008293, over 15614.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08984, pruned_loss=0.01217, audio_tagging_loss=0.008391, over 3045222.20 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:50:55,070 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.774e+01 8.790e+01 9.510e+01 1.033e+02 1.199e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 18:51:01,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3626586.6666666665, ans=0.0 2023-11-28 18:51:07,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3626653.3333333335, ans=0.0 2023-11-28 18:51:08,262 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544000 2023-11-28 18:51:10,222 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-544000.pt 2023-11-28 18:51:13,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3626653.3333333335, ans=0.2 2023-11-28 18:51:25,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3626720.0, ans=0.0 2023-11-28 18:51:26,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.44 vs. limit=15.0 2023-11-28 18:51:41,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3626786.6666666665, ans=0.015 2023-11-28 18:51:44,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3626786.6666666665, ans=0.0 2023-11-28 18:51:48,533 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2950, loss[loss=0.07946, simple_loss=0.1125, pruned_loss=0.01456, audio_tagging_loss=0.008625, over 15262.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08947, pruned_loss=0.0122, audio_tagging_loss=0.008452, over 3050410.48 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:51:51,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3626853.3333333335, ans=0.125 2023-11-28 18:51:55,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3626853.3333333335, ans=0.125 2023-11-28 18:52:13,313 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544050 2023-11-28 18:52:13,816 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.97 vs. limit=15.0 2023-11-28 18:52:39,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3627120.0, ans=0.0 2023-11-28 18:52:41,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3627120.0, ans=0.2 2023-11-28 18:52:48,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3627120.0, ans=0.025 2023-11-28 18:52:50,270 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3000, loss[loss=0.04532, simple_loss=0.05462, pruned_loss=0.006176, audio_tagging_loss=0.01183, over 14152.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08986, pruned_loss=0.01232, audio_tagging_loss=0.00852, over 3044774.25 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:52:50,272 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 18:53:33,231 INFO [train_asr.py:1267] (0/4) Epoch 46, validation: loss=0.05731, simple_loss=0.05055, pruned_loss=0.005328, audio_tagging_loss=0.02671, over 4681554.00 frames. 2023-11-28 18:53:33,232 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 18:53:39,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3627186.6666666665, ans=0.0 2023-11-28 18:53:44,169 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 9.011e+01 9.606e+01 1.015e+02 1.587e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 18:53:52,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3627253.3333333335, ans=0.035 2023-11-28 18:53:56,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3627320.0, ans=0.0 2023-11-28 18:53:57,592 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544100 2023-11-28 18:54:08,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3627386.6666666665, ans=0.0 2023-11-28 18:54:09,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3627386.6666666665, ans=0.07 2023-11-28 18:54:30,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3627453.3333333335, ans=0.0 2023-11-28 18:54:34,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3627520.0, ans=0.1 2023-11-28 18:54:34,931 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3050, loss[loss=0.103, simple_loss=0.1381, pruned_loss=0.02582, audio_tagging_loss=0.00811, over 15772.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09014, pruned_loss=0.0124, audio_tagging_loss=0.008559, over 3044673.59 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:54:42,236 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:54:42,580 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.47 vs. limit=12.0 2023-11-28 18:54:59,398 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544150 2023-11-28 18:55:13,387 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:55:27,102 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.25 vs. limit=15.0 2023-11-28 18:55:35,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3627786.6666666665, ans=0.0 2023-11-28 18:55:37,558 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3100, loss[loss=0.07868, simple_loss=0.1215, pruned_loss=0.01346, audio_tagging_loss=0.004471, over 16389.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09137, pruned_loss=0.01242, audio_tagging_loss=0.008579, over 3044987.04 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:55:48,883 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.644e+01 9.039e+01 9.695e+01 1.074e+02 1.445e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 18:56:03,247 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544200 2023-11-28 18:56:06,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3627986.6666666665, ans=0.125 2023-11-28 18:56:08,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3627986.6666666665, ans=0.1 2023-11-28 18:56:31,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3628120.0, ans=0.1 2023-11-28 18:56:39,962 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3150, loss[loss=0.0602, simple_loss=0.08101, pruned_loss=0.008566, audio_tagging_loss=0.01113, over 15399.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09096, pruned_loss=0.01245, audio_tagging_loss=0.008747, over 3037295.13 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:56:50,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3628186.6666666665, ans=0.125 2023-11-28 18:56:56,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3628253.3333333335, ans=0.125 2023-11-28 18:56:57,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.63 vs. limit=15.0 2023-11-28 18:57:05,216 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544250 2023-11-28 18:57:09,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3628320.0, ans=0.1 2023-11-28 18:57:09,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3628320.0, ans=0.125 2023-11-28 18:57:11,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3628320.0, ans=0.0 2023-11-28 18:57:39,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3628453.3333333335, ans=0.125 2023-11-28 18:57:41,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3628520.0, ans=0.125 2023-11-28 18:57:42,628 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3200, loss[loss=0.07063, simple_loss=0.0986, pruned_loss=0.01299, audio_tagging_loss=0.008336, over 15540.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09114, pruned_loss=0.0125, audio_tagging_loss=0.008797, over 3036263.11 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:57:44,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3628520.0, ans=0.125 2023-11-28 18:57:52,878 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.859e+01 9.188e+01 9.825e+01 1.034e+02 1.228e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-28 18:58:01,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3628586.6666666665, ans=0.1 2023-11-28 18:58:07,023 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544300 2023-11-28 18:58:07,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3628653.3333333335, ans=0.1 2023-11-28 18:58:44,584 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3250, loss[loss=0.04699, simple_loss=0.05925, pruned_loss=0.009319, audio_tagging_loss=0.008044, over 14453.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09016, pruned_loss=0.01242, audio_tagging_loss=0.008871, over 3036500.44 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:58:53,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3628853.3333333335, ans=0.0 2023-11-28 18:58:58,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3628920.0, ans=0.0 2023-11-28 18:58:59,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3628920.0, ans=0.125 2023-11-28 18:59:01,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3628920.0, ans=0.0 2023-11-28 18:59:03,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=3628920.0, ans=0.2 2023-11-28 18:59:09,809 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544350 2023-11-28 18:59:15,105 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.61 vs. limit=10.0 2023-11-28 18:59:23,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3629053.3333333335, ans=0.125 2023-11-28 18:59:33,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3629120.0, ans=0.0 2023-11-28 18:59:46,039 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3300, loss[loss=0.05962, simple_loss=0.06709, pruned_loss=0.01199, audio_tagging_loss=0.01409, over 14837.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08927, pruned_loss=0.01226, audio_tagging_loss=0.009031, over 3036637.53 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:59:52,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3629186.6666666665, ans=0.0 2023-11-28 18:59:57,862 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 9.009e+01 9.919e+01 1.085e+02 1.499e+02, threshold=1.984e+02, percent-clipped=0.0 2023-11-28 19:00:01,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3629253.3333333335, ans=0.125 2023-11-28 19:00:06,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3629253.3333333335, ans=0.1 2023-11-28 19:00:10,749 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544400 2023-11-28 19:00:19,265 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.37 vs. limit=15.0 2023-11-28 19:00:20,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3629320.0, ans=0.1 2023-11-28 19:00:35,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3629453.3333333335, ans=0.125 2023-11-28 19:00:48,545 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3350, loss[loss=0.08608, simple_loss=0.1268, pruned_loss=0.01614, audio_tagging_loss=0.006547, over 15269.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08886, pruned_loss=0.01211, audio_tagging_loss=0.00897, over 3036952.53 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:01:05,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3629586.6666666665, ans=0.125 2023-11-28 19:01:12,646 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544450 2023-11-28 19:01:49,493 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3400, loss[loss=0.06812, simple_loss=0.1021, pruned_loss=0.01069, audio_tagging_loss=0.006366, over 16318.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08865, pruned_loss=0.01211, audio_tagging_loss=0.008827, over 3037351.25 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:01:59,090 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.15 vs. limit=10.0 2023-11-28 19:02:01,876 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.836e+01 9.096e+01 9.800e+01 1.047e+02 1.329e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-28 19:02:02,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3629920.0, ans=0.2 2023-11-28 19:02:10,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3629920.0, ans=0.125 2023-11-28 19:02:14,120 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544500 2023-11-28 19:02:18,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3629986.6666666665, ans=0.0 2023-11-28 19:02:32,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3630053.3333333335, ans=0.2 2023-11-28 19:02:51,170 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3450, loss[loss=0.06907, simple_loss=0.0976, pruned_loss=0.0128, audio_tagging_loss=0.007467, over 17038.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08908, pruned_loss=0.01218, audio_tagging_loss=0.008656, over 3035399.76 frames. ], batch size: 65, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:03:06,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3630253.3333333335, ans=0.0 2023-11-28 19:03:12,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3630253.3333333335, ans=0.125 2023-11-28 19:03:17,023 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544550 2023-11-28 19:03:17,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3630320.0, ans=0.125 2023-11-28 19:03:33,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3630386.6666666665, ans=0.2 2023-11-28 19:03:53,812 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3500, loss[loss=0.06332, simple_loss=0.0914, pruned_loss=0.01063, audio_tagging_loss=0.006993, over 15862.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08892, pruned_loss=0.01205, audio_tagging_loss=0.008583, over 3041328.86 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:03:56,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3630520.0, ans=0.125 2023-11-28 19:04:01,019 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2023-11-28 19:04:03,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3630520.0, ans=0.125 2023-11-28 19:04:06,290 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.809e+01 9.584e+01 1.024e+02 1.310e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 19:04:18,810 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544600 2023-11-28 19:04:25,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3630653.3333333335, ans=0.125 2023-11-28 19:04:27,715 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:04:31,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3630720.0, ans=0.125 2023-11-28 19:04:38,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3630720.0, ans=0.0 2023-11-28 19:04:55,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=22.5 2023-11-28 19:04:56,008 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3550, loss[loss=0.0705, simple_loss=0.0931, pruned_loss=0.0149, audio_tagging_loss=0.00905, over 15248.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08819, pruned_loss=0.0121, audio_tagging_loss=0.008536, over 3039677.51 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 19:04:56,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3630853.3333333335, ans=0.2 2023-11-28 19:05:02,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3630853.3333333335, ans=0.0 2023-11-28 19:05:21,009 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544650 2023-11-28 19:05:33,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3631053.3333333335, ans=0.125 2023-11-28 19:05:48,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3631120.0, ans=0.125 2023-11-28 19:05:51,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3631120.0, ans=0.0 2023-11-28 19:05:52,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3631120.0, ans=0.04949747468305833 2023-11-28 19:05:58,347 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3600, loss[loss=0.0737, simple_loss=0.1042, pruned_loss=0.01573, audio_tagging_loss=0.005841, over 15956.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08833, pruned_loss=0.012, audio_tagging_loss=0.008513, over 3039258.02 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:05:58,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3631186.6666666665, ans=0.2 2023-11-28 19:06:12,394 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.114e+01 8.905e+01 9.661e+01 1.038e+02 1.227e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 19:06:23,037 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544700 2023-11-28 19:06:24,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3631320.0, ans=0.1 2023-11-28 19:06:25,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3631320.0, ans=0.07 2023-11-28 19:06:33,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3631320.0, ans=0.0 2023-11-28 19:06:35,772 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.00 vs. limit=10.0 2023-11-28 19:06:41,149 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=15.0 2023-11-28 19:06:53,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3631453.3333333335, ans=0.2 2023-11-28 19:07:00,634 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3650, loss[loss=0.03607, simple_loss=0.04426, pruned_loss=0.003451, audio_tagging_loss=0.01049, over 14783.00 frames. ], tot_loss[loss=0.06407, simple_loss=0.08746, pruned_loss=0.01182, audio_tagging_loss=0.00852, over 3036039.95 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:07:19,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3631586.6666666665, ans=0.125 2023-11-28 19:07:25,323 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544750 2023-11-28 19:07:45,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3631720.0, ans=0.125 2023-11-28 19:08:01,843 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3700, loss[loss=0.0688, simple_loss=0.08728, pruned_loss=0.0125, audio_tagging_loss=0.01266, over 15485.00 frames. ], tot_loss[loss=0.06411, simple_loss=0.0872, pruned_loss=0.01182, audio_tagging_loss=0.008688, over 3041483.47 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:08:15,922 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.815e+01 9.135e+01 9.668e+01 1.042e+02 1.211e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 19:08:17,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3631920.0, ans=0.125 2023-11-28 19:08:22,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3631920.0, ans=0.125 2023-11-28 19:08:27,334 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544800 2023-11-28 19:08:43,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3632053.3333333335, ans=0.125 2023-11-28 19:09:05,321 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3750, loss[loss=0.07359, simple_loss=0.1006, pruned_loss=0.01241, audio_tagging_loss=0.01088, over 15425.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08778, pruned_loss=0.01189, audio_tagging_loss=0.008798, over 3043590.62 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:09:10,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3632186.6666666665, ans=0.0 2023-11-28 19:09:15,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3632186.6666666665, ans=0.5 2023-11-28 19:09:25,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3632253.3333333335, ans=0.0 2023-11-28 19:09:30,406 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544850 2023-11-28 19:09:50,403 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:09:50,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3632386.6666666665, ans=0.07 2023-11-28 19:09:53,387 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.14 vs. limit=15.0 2023-11-28 19:10:08,038 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3800, loss[loss=0.06507, simple_loss=0.08408, pruned_loss=0.01577, audio_tagging_loss=0.007264, over 15488.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08866, pruned_loss=0.01197, audio_tagging_loss=0.008647, over 3048920.45 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:10:17,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3632520.0, ans=0.0 2023-11-28 19:10:20,379 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.76 vs. limit=15.0 2023-11-28 19:10:20,994 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.825e+01 8.975e+01 9.556e+01 1.041e+02 1.200e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 19:10:32,374 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544900 2023-11-28 19:10:49,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2023-11-28 19:10:56,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3632786.6666666665, ans=0.0 2023-11-28 19:11:04,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3632786.6666666665, ans=0.125 2023-11-28 19:11:08,710 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3850, loss[loss=0.06576, simple_loss=0.09324, pruned_loss=0.01195, audio_tagging_loss=0.0072, over 14729.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08869, pruned_loss=0.01204, audio_tagging_loss=0.008645, over 3051607.14 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 19:11:11,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3632853.3333333335, ans=0.5 2023-11-28 19:11:13,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3632853.3333333335, ans=0.04949747468305833 2023-11-28 19:11:21,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3632920.0, ans=0.125 2023-11-28 19:11:24,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3632920.0, ans=0.125 2023-11-28 19:11:34,488 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544950 2023-11-28 19:11:42,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3632986.6666666665, ans=0.125 2023-11-28 19:11:47,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3633053.3333333335, ans=0.0 2023-11-28 19:11:50,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3633053.3333333335, ans=0.0 2023-11-28 19:12:08,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3633120.0, ans=0.125 2023-11-28 19:12:09,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3633120.0, ans=10.0 2023-11-28 19:12:11,433 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3900, loss[loss=0.06075, simple_loss=0.08185, pruned_loss=0.007165, audio_tagging_loss=0.01266, over 15093.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08876, pruned_loss=0.01195, audio_tagging_loss=0.008737, over 3046638.65 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 19:12:18,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3633186.6666666665, ans=0.0 2023-11-28 19:12:26,149 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.937e+01 9.555e+01 1.040e+02 1.282e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 19:12:35,683 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.72 vs. limit=15.0 2023-11-28 19:12:36,418 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545000 2023-11-28 19:13:13,855 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3950, loss[loss=0.05924, simple_loss=0.07741, pruned_loss=0.01063, audio_tagging_loss=0.0099, over 14902.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08909, pruned_loss=0.01215, audio_tagging_loss=0.008794, over 3046293.55 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 19:13:38,155 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545050 2023-11-28 19:13:39,810 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2023-11-28 19:13:45,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3633653.3333333335, ans=0.125 2023-11-28 19:14:14,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3633853.3333333335, ans=0.1 2023-11-28 19:14:15,558 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4000, loss[loss=0.07304, simple_loss=0.1052, pruned_loss=0.01387, audio_tagging_loss=0.006555, over 15269.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08914, pruned_loss=0.01215, audio_tagging_loss=0.008877, over 3037754.05 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:14:15,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3633853.3333333335, ans=0.1 2023-11-28 19:14:18,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3633853.3333333335, ans=0.125 2023-11-28 19:14:25,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3633853.3333333335, ans=0.1 2023-11-28 19:14:25,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3633853.3333333335, ans=0.0 2023-11-28 19:14:30,291 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 9.097e+01 9.894e+01 1.091e+02 1.423e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-28 19:14:36,517 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.14 vs. limit=15.0 2023-11-28 19:14:39,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3633986.6666666665, ans=0.0 2023-11-28 19:14:40,526 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545100 2023-11-28 19:15:08,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3634120.0, ans=0.2 2023-11-28 19:15:17,449 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4050, loss[loss=0.07351, simple_loss=0.1191, pruned_loss=0.008702, audio_tagging_loss=0.005257, over 14618.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08884, pruned_loss=0.01216, audio_tagging_loss=0.008927, over 3034138.27 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:15:22,284 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:15:40,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3634253.3333333335, ans=0.125 2023-11-28 19:15:43,027 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545150 2023-11-28 19:15:43,584 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2023-11-28 19:16:03,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3634386.6666666665, ans=0.125 2023-11-28 19:16:13,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3634453.3333333335, ans=0.2 2023-11-28 19:16:18,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3634520.0, ans=0.0 2023-11-28 19:16:19,699 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4100, loss[loss=0.06217, simple_loss=0.0771, pruned_loss=0.01347, audio_tagging_loss=0.01015, over 15586.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08931, pruned_loss=0.01222, audio_tagging_loss=0.0091, over 3034909.79 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:16:26,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3634520.0, ans=0.125 2023-11-28 19:16:27,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3634520.0, ans=0.2 2023-11-28 19:16:34,102 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 9.021e+01 9.586e+01 1.040e+02 1.361e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 19:16:43,571 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545200 2023-11-28 19:16:47,357 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:16:54,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3634720.0, ans=0.2 2023-11-28 19:16:58,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3634720.0, ans=0.125 2023-11-28 19:16:58,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3634720.0, ans=0.07 2023-11-28 19:17:14,775 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=22.5 2023-11-28 19:17:15,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3634786.6666666665, ans=0.125 2023-11-28 19:17:21,146 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4150, loss[loss=0.07589, simple_loss=0.1078, pruned_loss=0.01713, audio_tagging_loss=0.004848, over 15619.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08959, pruned_loss=0.01221, audio_tagging_loss=0.008878, over 3032671.55 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:17:37,467 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:17:45,656 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545250 2023-11-28 19:17:52,251 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.88 vs. limit=15.0 2023-11-28 19:18:08,574 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:18:22,698 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4200, loss[loss=0.07496, simple_loss=0.1007, pruned_loss=0.01779, audio_tagging_loss=0.006817, over 15566.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09078, pruned_loss=0.01236, audio_tagging_loss=0.008783, over 3037043.16 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:18:23,176 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2023-11-28 19:18:25,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3635186.6666666665, ans=0.125 2023-11-28 19:18:28,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3635186.6666666665, ans=0.0 2023-11-28 19:18:29,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3635186.6666666665, ans=0.125 2023-11-28 19:18:37,262 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.060e+01 9.058e+01 9.549e+01 9.941e+01 2.004e+02, threshold=1.910e+02, percent-clipped=1.0 2023-11-28 19:18:48,319 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545300 2023-11-28 19:18:53,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3635320.0, ans=0.125 2023-11-28 19:19:00,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=3635386.6666666665, ans=22.5 2023-11-28 19:19:08,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3635386.6666666665, ans=0.0 2023-11-28 19:19:20,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3635453.3333333335, ans=0.0 2023-11-28 19:19:23,091 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.77 vs. limit=15.0 2023-11-28 19:19:25,335 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4250, loss[loss=0.04623, simple_loss=0.06317, pruned_loss=0.006463, audio_tagging_loss=0.008176, over 15219.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09202, pruned_loss=0.01255, audio_tagging_loss=0.008606, over 3041245.90 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:19:34,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3635520.0, ans=0.1 2023-11-28 19:19:40,972 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=22.5 2023-11-28 19:19:41,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3635586.6666666665, ans=0.125 2023-11-28 19:19:51,020 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545350 2023-11-28 19:20:09,001 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.43 vs. limit=15.0 2023-11-28 19:20:16,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3635786.6666666665, ans=0.0 2023-11-28 19:20:20,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3635786.6666666665, ans=0.125 2023-11-28 19:20:25,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3635786.6666666665, ans=0.09899494936611666 2023-11-28 19:20:26,984 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.79 vs. limit=22.5 2023-11-28 19:20:28,749 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4300, loss[loss=0.0433, simple_loss=0.0617, pruned_loss=0.005821, audio_tagging_loss=0.006628, over 15174.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09138, pruned_loss=0.01253, audio_tagging_loss=0.008599, over 3049070.47 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:20:29,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3635853.3333333335, ans=0.125 2023-11-28 19:20:42,796 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.171e+01 9.029e+01 9.603e+01 1.044e+02 1.295e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 19:20:52,816 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545400 2023-11-28 19:21:10,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3636053.3333333335, ans=0.125 2023-11-28 19:21:22,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3636120.0, ans=0.1 2023-11-28 19:21:29,168 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4350, loss[loss=0.05702, simple_loss=0.07949, pruned_loss=0.009153, audio_tagging_loss=0.008126, over 14682.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09169, pruned_loss=0.0125, audio_tagging_loss=0.008463, over 3046069.45 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:21:54,061 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545450 2023-11-28 19:22:04,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3636320.0, ans=0.125 2023-11-28 19:22:07,112 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:22:08,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3636386.6666666665, ans=0.125 2023-11-28 19:22:17,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3636453.3333333335, ans=0.0 2023-11-28 19:22:24,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3636453.3333333335, ans=0.1 2023-11-28 19:22:31,058 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4400, loss[loss=0.08556, simple_loss=0.1145, pruned_loss=0.01997, audio_tagging_loss=0.008352, over 14976.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09209, pruned_loss=0.01265, audio_tagging_loss=0.008444, over 3049260.18 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:22:46,748 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 9.163e+01 9.666e+01 1.055e+02 1.360e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-28 19:22:56,237 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545500 2023-11-28 19:23:05,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3636653.3333333335, ans=0.04949747468305833 2023-11-28 19:23:06,116 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.59 vs. limit=22.5 2023-11-28 19:23:16,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3636720.0, ans=0.2 2023-11-28 19:23:33,510 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4450, loss[loss=0.07412, simple_loss=0.1014, pruned_loss=0.01424, audio_tagging_loss=0.009189, over 15103.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09135, pruned_loss=0.01264, audio_tagging_loss=0.008438, over 3049467.22 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:23:58,997 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545550 2023-11-28 19:24:06,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3636986.6666666665, ans=0.035 2023-11-28 19:24:07,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3636986.6666666665, ans=0.2 2023-11-28 19:24:23,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3637120.0, ans=0.2 2023-11-28 19:24:35,782 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4500, loss[loss=0.05925, simple_loss=0.08859, pruned_loss=0.009515, audio_tagging_loss=0.005445, over 14515.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09127, pruned_loss=0.01256, audio_tagging_loss=0.008433, over 3050241.31 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:24:48,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3637253.3333333335, ans=0.0 2023-11-28 19:24:50,608 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.348e+01 8.752e+01 9.380e+01 1.023e+02 1.206e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 19:25:00,944 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545600 2023-11-28 19:25:22,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3637386.6666666665, ans=0.95 2023-11-28 19:25:38,423 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4550, loss[loss=0.05557, simple_loss=0.07592, pruned_loss=0.008389, audio_tagging_loss=0.009221, over 15939.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09042, pruned_loss=0.01216, audio_tagging_loss=0.008461, over 3045086.93 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:25:40,303 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.76 vs. limit=15.0 2023-11-28 19:25:48,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3637520.0, ans=0.035 2023-11-28 19:26:03,958 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545650 2023-11-28 19:26:11,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3637653.3333333335, ans=0.125 2023-11-28 19:26:12,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3637653.3333333335, ans=0.0 2023-11-28 19:26:12,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3637653.3333333335, ans=0.0 2023-11-28 19:26:28,150 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:26:40,930 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4600, loss[loss=0.06262, simple_loss=0.08512, pruned_loss=0.0087, audio_tagging_loss=0.01136, over 14533.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08972, pruned_loss=0.01207, audio_tagging_loss=0.008598, over 3046288.22 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:26:44,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3637853.3333333335, ans=0.0 2023-11-28 19:26:45,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3637853.3333333335, ans=0.125 2023-11-28 19:26:55,383 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.776e+01 9.447e+01 1.031e+02 1.407e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 19:26:58,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3637920.0, ans=0.0 2023-11-28 19:27:05,309 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545700 2023-11-28 19:27:26,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3638053.3333333335, ans=0.0 2023-11-28 19:27:42,005 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4650, loss[loss=0.05046, simple_loss=0.06912, pruned_loss=0.006721, audio_tagging_loss=0.009181, over 14096.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08986, pruned_loss=0.01197, audio_tagging_loss=0.008618, over 3045377.64 frames. ], batch size: 52, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:27:45,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3638186.6666666665, ans=0.125 2023-11-28 19:27:52,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3638186.6666666665, ans=0.0 2023-11-28 19:28:02,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3638253.3333333335, ans=0.05 2023-11-28 19:28:02,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3638253.3333333335, ans=0.125 2023-11-28 19:28:06,612 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545750 2023-11-28 19:28:10,733 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.03 vs. limit=22.5 2023-11-28 19:28:11,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3638320.0, ans=0.125 2023-11-28 19:28:31,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3638453.3333333335, ans=0.125 2023-11-28 19:28:44,210 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4700, loss[loss=0.07894, simple_loss=0.1071, pruned_loss=0.01814, audio_tagging_loss=0.007241, over 14091.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08979, pruned_loss=0.012, audio_tagging_loss=0.0087, over 3043874.25 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:28:49,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3638520.0, ans=0.125 2023-11-28 19:28:56,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3638586.6666666665, ans=0.0 2023-11-28 19:29:00,544 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 9.174e+01 9.774e+01 1.029e+02 1.399e+02, threshold=1.955e+02, percent-clipped=0.0 2023-11-28 19:29:09,174 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545800 2023-11-28 19:29:32,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.43 vs. limit=5.0 2023-11-28 19:29:33,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3638786.6666666665, ans=0.1 2023-11-28 19:29:37,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3638786.6666666665, ans=0.125 2023-11-28 19:29:38,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3638786.6666666665, ans=0.1 2023-11-28 19:29:40,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=3638786.6666666665, ans=0.2 2023-11-28 19:29:42,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3638786.6666666665, ans=0.125 2023-11-28 19:29:47,384 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4750, loss[loss=0.06809, simple_loss=0.09098, pruned_loss=0.01075, audio_tagging_loss=0.01185, over 14362.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08995, pruned_loss=0.01203, audio_tagging_loss=0.008767, over 3044979.61 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:29:52,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3638853.3333333335, ans=0.125 2023-11-28 19:29:52,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3638853.3333333335, ans=0.0 2023-11-28 19:30:03,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3638920.0, ans=0.2 2023-11-28 19:30:03,443 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.84 vs. limit=22.5 2023-11-28 19:30:11,993 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545850 2023-11-28 19:30:13,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3638986.6666666665, ans=0.0 2023-11-28 19:30:13,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3638986.6666666665, ans=0.125 2023-11-28 19:30:17,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3638986.6666666665, ans=0.0 2023-11-28 19:30:48,666 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4800, loss[loss=0.07447, simple_loss=0.1072, pruned_loss=0.01336, audio_tagging_loss=0.007533, over 15051.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.0894, pruned_loss=0.01194, audio_tagging_loss=0.008893, over 3051388.79 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:30:53,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3639186.6666666665, ans=0.2 2023-11-28 19:31:05,144 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.815e+01 9.365e+01 1.036e+02 1.386e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 19:31:14,141 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545900 2023-11-28 19:31:24,582 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.08 vs. limit=10.0 2023-11-28 19:31:48,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3639453.3333333335, ans=0.035 2023-11-28 19:31:51,025 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4850, loss[loss=0.06182, simple_loss=0.08484, pruned_loss=0.01289, audio_tagging_loss=0.006509, over 14966.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08959, pruned_loss=0.01203, audio_tagging_loss=0.008914, over 3042328.98 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:32:05,414 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=22.5 2023-11-28 19:32:07,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3639586.6666666665, ans=0.0 2023-11-28 19:32:12,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3639586.6666666665, ans=0.125 2023-11-28 19:32:14,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3639653.3333333335, ans=0.07 2023-11-28 19:32:15,719 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545950 2023-11-28 19:32:17,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.08 vs. limit=15.0 2023-11-28 19:32:24,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3639653.3333333335, ans=0.0 2023-11-28 19:32:52,958 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4900, loss[loss=0.07035, simple_loss=0.1011, pruned_loss=0.01137, audio_tagging_loss=0.008449, over 15664.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08991, pruned_loss=0.01209, audio_tagging_loss=0.008887, over 3050356.82 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:32:56,439 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2023-11-28 19:33:00,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3639853.3333333335, ans=0.1 2023-11-28 19:33:05,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3639920.0, ans=0.0 2023-11-28 19:33:10,133 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.219e+01 8.839e+01 9.451e+01 1.014e+02 1.484e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 19:33:10,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3639920.0, ans=0.07 2023-11-28 19:33:17,163 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546000 2023-11-28 19:33:41,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3640120.0, ans=0.025 2023-11-28 19:33:54,893 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4950, loss[loss=0.04854, simple_loss=0.06989, pruned_loss=0.006787, audio_tagging_loss=0.006805, over 15225.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08973, pruned_loss=0.01206, audio_tagging_loss=0.00867, over 3049993.53 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:34:02,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3640186.6666666665, ans=0.125 2023-11-28 19:34:06,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3640253.3333333335, ans=0.2 2023-11-28 19:34:08,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3640253.3333333335, ans=0.1 2023-11-28 19:34:15,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3640253.3333333335, ans=0.125 2023-11-28 19:34:19,504 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546050 2023-11-28 19:34:39,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3640386.6666666665, ans=0.125 2023-11-28 19:34:50,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3640453.3333333335, ans=0.0 2023-11-28 19:34:50,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3640453.3333333335, ans=0.1 2023-11-28 19:34:54,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3640520.0, ans=0.2 2023-11-28 19:34:55,723 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5000, loss[loss=0.06806, simple_loss=0.09045, pruned_loss=0.01156, audio_tagging_loss=0.01127, over 15107.00 frames. ], tot_loss[loss=0.066, simple_loss=0.0904, pruned_loss=0.01228, audio_tagging_loss=0.008523, over 3057993.36 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:35:04,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3640520.0, ans=0.0 2023-11-28 19:35:10,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3640586.6666666665, ans=0.0 2023-11-28 19:35:13,460 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.947e+01 9.568e+01 1.019e+02 1.168e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 19:35:21,199 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546100 2023-11-28 19:35:29,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3640653.3333333335, ans=0.125 2023-11-28 19:35:31,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3640720.0, ans=0.0 2023-11-28 19:35:58,105 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5050, loss[loss=0.05473, simple_loss=0.07888, pruned_loss=0.006473, audio_tagging_loss=0.008822, over 15208.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09079, pruned_loss=0.01223, audio_tagging_loss=0.008466, over 3056437.29 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:36:14,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3640920.0, ans=0.2 2023-11-28 19:36:16,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3640920.0, ans=0.2 2023-11-28 19:36:22,334 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546150 2023-11-28 19:36:23,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3640986.6666666665, ans=0.125 2023-11-28 19:36:25,384 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.14 vs. limit=15.0 2023-11-28 19:36:28,801 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.58 vs. limit=22.5 2023-11-28 19:36:46,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3641120.0, ans=0.125 2023-11-28 19:36:59,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3641186.6666666665, ans=0.125 2023-11-28 19:37:00,150 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5100, loss[loss=0.05857, simple_loss=0.07407, pruned_loss=0.01232, audio_tagging_loss=0.009207, over 13962.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08919, pruned_loss=0.01191, audio_tagging_loss=0.008571, over 3048297.54 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:37:17,161 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 8.885e+01 9.689e+01 1.021e+02 1.449e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-28 19:37:24,872 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546200 2023-11-28 19:37:40,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3641386.6666666665, ans=0.125 2023-11-28 19:37:41,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3641386.6666666665, ans=0.125 2023-11-28 19:37:49,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3641453.3333333335, ans=0.125 2023-11-28 19:38:01,231 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5150, loss[loss=0.0656, simple_loss=0.09569, pruned_loss=0.01044, audio_tagging_loss=0.007314, over 15734.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08923, pruned_loss=0.01197, audio_tagging_loss=0.008523, over 3041960.89 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:38:09,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3641520.0, ans=0.0 2023-11-28 19:38:13,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.30 vs. limit=15.0 2023-11-28 19:38:18,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3641586.6666666665, ans=0.0 2023-11-28 19:38:18,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3641586.6666666665, ans=0.1 2023-11-28 19:38:27,243 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546250 2023-11-28 19:38:35,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=22.5 2023-11-28 19:38:48,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3641720.0, ans=0.5 2023-11-28 19:38:54,621 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.13 vs. limit=15.0 2023-11-28 19:39:04,446 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5200, loss[loss=0.03809, simple_loss=0.04809, pruned_loss=0.006348, audio_tagging_loss=0.007695, over 16956.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08862, pruned_loss=0.01178, audio_tagging_loss=0.008477, over 3053390.41 frames. ], batch size: 66, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:39:07,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3641853.3333333335, ans=0.125 2023-11-28 19:39:22,730 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.764e+01 9.099e+01 9.660e+01 1.024e+02 1.324e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 19:39:28,757 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546300 2023-11-28 19:39:57,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3642120.0, ans=0.125 2023-11-28 19:40:06,061 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5250, loss[loss=0.05419, simple_loss=0.06527, pruned_loss=0.008731, audio_tagging_loss=0.01282, over 14446.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08949, pruned_loss=0.01206, audio_tagging_loss=0.008534, over 3055424.14 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:40:17,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3642253.3333333335, ans=0.1 2023-11-28 19:40:23,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3642253.3333333335, ans=0.1 2023-11-28 19:40:30,141 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546350 2023-11-28 19:40:36,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3642320.0, ans=0.125 2023-11-28 19:40:38,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3642320.0, ans=0.1 2023-11-28 19:40:38,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3642320.0, ans=0.0 2023-11-28 19:40:51,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3642386.6666666665, ans=0.125 2023-11-28 19:40:56,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3642453.3333333335, ans=0.125 2023-11-28 19:40:58,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3642453.3333333335, ans=0.125 2023-11-28 19:41:06,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3642520.0, ans=0.04949747468305833 2023-11-28 19:41:06,888 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5300, loss[loss=0.08413, simple_loss=0.1169, pruned_loss=0.01766, audio_tagging_loss=0.008039, over 14937.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08985, pruned_loss=0.0122, audio_tagging_loss=0.008521, over 3053099.06 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:41:09,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3642520.0, ans=0.1 2023-11-28 19:41:25,539 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.603e+01 9.027e+01 9.738e+01 1.069e+02 1.273e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-28 19:41:32,034 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546400 2023-11-28 19:41:56,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.31 vs. limit=22.5 2023-11-28 19:42:03,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3642786.6666666665, ans=0.0 2023-11-28 19:42:07,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3642853.3333333335, ans=0.125 2023-11-28 19:42:08,145 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5350, loss[loss=0.06328, simple_loss=0.0785, pruned_loss=0.01445, audio_tagging_loss=0.009579, over 14123.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08948, pruned_loss=0.01209, audio_tagging_loss=0.008569, over 3043797.76 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:42:17,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3642853.3333333335, ans=0.1 2023-11-28 19:42:24,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3642920.0, ans=0.1 2023-11-28 19:42:27,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3642920.0, ans=0.0 2023-11-28 19:42:33,665 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546450 2023-11-28 19:42:53,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3643053.3333333335, ans=0.125 2023-11-28 19:43:08,433 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:43:10,560 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5400, loss[loss=0.07206, simple_loss=0.0965, pruned_loss=0.01738, audio_tagging_loss=0.006422, over 15451.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08977, pruned_loss=0.01209, audio_tagging_loss=0.008514, over 3043376.05 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:43:12,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3643186.6666666665, ans=15.0 2023-11-28 19:43:15,954 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.37 vs. limit=15.0 2023-11-28 19:43:28,022 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 9.157e+01 9.833e+01 1.043e+02 1.444e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-28 19:43:28,951 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.60 vs. limit=10.0 2023-11-28 19:43:35,117 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546500 2023-11-28 19:43:57,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3643386.6666666665, ans=0.1 2023-11-28 19:43:57,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3643386.6666666665, ans=0.125 2023-11-28 19:44:03,458 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.07 vs. limit=10.0 2023-11-28 19:44:12,624 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5450, loss[loss=0.07437, simple_loss=0.097, pruned_loss=0.01582, audio_tagging_loss=0.01005, over 15423.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08948, pruned_loss=0.01207, audio_tagging_loss=0.008594, over 3043738.79 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:44:13,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3643520.0, ans=0.0 2023-11-28 19:44:37,558 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546550 2023-11-28 19:45:05,512 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.34 vs. limit=22.5 2023-11-28 19:45:07,794 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.65 vs. limit=15.0 2023-11-28 19:45:14,840 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5500, loss[loss=0.07763, simple_loss=0.1081, pruned_loss=0.01495, audio_tagging_loss=0.008635, over 15604.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08983, pruned_loss=0.01212, audio_tagging_loss=0.008604, over 3044543.91 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:45:20,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3643853.3333333335, ans=0.2 2023-11-28 19:45:24,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3643853.3333333335, ans=0.1 2023-11-28 19:45:31,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3643920.0, ans=0.1 2023-11-28 19:45:34,343 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.789e+01 9.570e+01 1.015e+02 1.276e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 19:45:40,435 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546600 2023-11-28 19:45:52,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3644053.3333333335, ans=0.07 2023-11-28 19:46:00,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3644053.3333333335, ans=0.125 2023-11-28 19:46:17,811 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5550, loss[loss=0.06838, simple_loss=0.08368, pruned_loss=0.01564, audio_tagging_loss=0.0109, over 15222.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08918, pruned_loss=0.01208, audio_tagging_loss=0.00878, over 3043726.58 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:46:37,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3644253.3333333335, ans=0.1 2023-11-28 19:46:41,306 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546650 2023-11-28 19:46:41,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3644320.0, ans=0.0 2023-11-28 19:46:44,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3644320.0, ans=0.125 2023-11-28 19:46:47,466 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.34 vs. limit=15.0 2023-11-28 19:46:48,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3644320.0, ans=0.0 2023-11-28 19:46:52,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3644386.6666666665, ans=0.0 2023-11-28 19:47:06,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3644453.3333333335, ans=0.2 2023-11-28 19:47:07,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3644453.3333333335, ans=0.125 2023-11-28 19:47:09,940 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.12 vs. limit=10.0 2023-11-28 19:47:16,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3644453.3333333335, ans=0.0 2023-11-28 19:47:18,554 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5600, loss[loss=0.05862, simple_loss=0.07589, pruned_loss=0.008553, audio_tagging_loss=0.01212, over 14876.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08876, pruned_loss=0.01198, audio_tagging_loss=0.008903, over 3046020.15 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:47:38,128 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 8.943e+01 9.603e+01 1.015e+02 1.818e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 19:47:43,659 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546700 2023-11-28 19:48:05,763 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:48:09,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3644786.6666666665, ans=0.2 2023-11-28 19:48:10,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3644786.6666666665, ans=0.125 2023-11-28 19:48:17,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3644786.6666666665, ans=0.125 2023-11-28 19:48:20,626 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5650, loss[loss=0.08481, simple_loss=0.1209, pruned_loss=0.01458, audio_tagging_loss=0.009774, over 15460.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08852, pruned_loss=0.01185, audio_tagging_loss=0.008893, over 3051596.36 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:48:20,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3644853.3333333335, ans=0.5 2023-11-28 19:48:23,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3644853.3333333335, ans=0.125 2023-11-28 19:48:23,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3644853.3333333335, ans=0.125 2023-11-28 19:48:25,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3644853.3333333335, ans=0.0 2023-11-28 19:48:30,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3644853.3333333335, ans=0.0 2023-11-28 19:48:35,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3644920.0, ans=0.95 2023-11-28 19:48:45,989 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546750 2023-11-28 19:48:47,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3644986.6666666665, ans=0.0 2023-11-28 19:48:51,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3644986.6666666665, ans=0.125 2023-11-28 19:49:00,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3645053.3333333335, ans=0.125 2023-11-28 19:49:01,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3645053.3333333335, ans=0.0 2023-11-28 19:49:19,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3645120.0, ans=0.125 2023-11-28 19:49:21,864 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5700, loss[loss=0.04126, simple_loss=0.05446, pruned_loss=0.005326, audio_tagging_loss=0.008709, over 13597.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08821, pruned_loss=0.01185, audio_tagging_loss=0.008867, over 3047081.27 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:49:27,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3645186.6666666665, ans=0.125 2023-11-28 19:49:30,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3645186.6666666665, ans=0.1 2023-11-28 19:49:32,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3645186.6666666665, ans=0.125 2023-11-28 19:49:41,949 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.661e+01 9.434e+01 1.006e+02 1.407e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 19:49:43,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3645253.3333333335, ans=0.0 2023-11-28 19:49:46,795 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546800 2023-11-28 19:49:46,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3645320.0, ans=0.125 2023-11-28 19:49:57,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3645320.0, ans=0.0 2023-11-28 19:49:58,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3645386.6666666665, ans=0.1 2023-11-28 19:50:00,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3645386.6666666665, ans=0.125 2023-11-28 19:50:00,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3645386.6666666665, ans=0.125 2023-11-28 19:50:24,574 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5750, loss[loss=0.07797, simple_loss=0.1168, pruned_loss=0.0135, audio_tagging_loss=0.00608, over 15367.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08867, pruned_loss=0.01186, audio_tagging_loss=0.008753, over 3046645.12 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:50:29,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3645520.0, ans=0.05 2023-11-28 19:50:29,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3645520.0, ans=0.125 2023-11-28 19:50:49,930 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546850 2023-11-28 19:51:10,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3645720.0, ans=0.125 2023-11-28 19:51:26,959 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5800, loss[loss=0.0695, simple_loss=0.09885, pruned_loss=0.0152, audio_tagging_loss=0.004882, over 15179.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.0893, pruned_loss=0.01187, audio_tagging_loss=0.008609, over 3050455.53 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:51:27,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3645853.3333333335, ans=0.09899494936611666 2023-11-28 19:51:44,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.34 vs. limit=10.0 2023-11-28 19:51:46,760 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.935e+01 9.736e+01 1.038e+02 1.216e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-28 19:51:49,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3645920.0, ans=0.5 2023-11-28 19:51:51,503 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546900 2023-11-28 19:52:03,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3646053.3333333335, ans=0.125 2023-11-28 19:52:06,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3646053.3333333335, ans=0.09899494936611666 2023-11-28 19:52:25,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3646120.0, ans=0.125 2023-11-28 19:52:27,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3646186.6666666665, ans=0.0 2023-11-28 19:52:28,746 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5850, loss[loss=0.07224, simple_loss=0.1084, pruned_loss=0.01227, audio_tagging_loss=0.005771, over 15506.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08998, pruned_loss=0.01206, audio_tagging_loss=0.008529, over 3052827.04 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:52:36,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3646186.6666666665, ans=0.125 2023-11-28 19:52:37,230 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.09 vs. limit=10.0 2023-11-28 19:52:53,894 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546950 2023-11-28 19:52:53,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3646320.0, ans=0.125 2023-11-28 19:53:01,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3646320.0, ans=0.1 2023-11-28 19:53:06,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3646386.6666666665, ans=0.125 2023-11-28 19:53:09,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3646386.6666666665, ans=0.125 2023-11-28 19:53:21,958 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.92 vs. limit=10.0 2023-11-28 19:53:26,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3646453.3333333335, ans=0.2 2023-11-28 19:53:30,704 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5900, loss[loss=0.06423, simple_loss=0.08225, pruned_loss=0.01138, audio_tagging_loss=0.01172, over 14453.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08948, pruned_loss=0.01192, audio_tagging_loss=0.008594, over 3060196.29 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:53:31,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3646520.0, ans=0.125 2023-11-28 19:53:37,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3646520.0, ans=0.125 2023-11-28 19:53:50,824 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 9.034e+01 9.568e+01 1.043e+02 1.665e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 19:53:55,732 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547000 2023-11-28 19:54:14,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3646720.0, ans=0.0 2023-11-28 19:54:15,295 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2023-11-28 19:54:19,856 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-28 19:54:25,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3646786.6666666665, ans=0.125 2023-11-28 19:54:33,354 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5950, loss[loss=0.09823, simple_loss=0.1358, pruned_loss=0.02202, audio_tagging_loss=0.008317, over 15333.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08964, pruned_loss=0.01205, audio_tagging_loss=0.008657, over 3055247.61 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:54:46,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3646920.0, ans=0.125 2023-11-28 19:54:55,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3646920.0, ans=0.07 2023-11-28 19:54:58,238 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547050 2023-11-28 19:55:24,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3647120.0, ans=0.1 2023-11-28 19:55:35,013 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6000, loss[loss=0.04453, simple_loss=0.06319, pruned_loss=0.005224, audio_tagging_loss=0.007706, over 15635.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08988, pruned_loss=0.01198, audio_tagging_loss=0.008455, over 3051098.64 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:55:35,016 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 19:56:14,886 INFO [train_asr.py:1267] (0/4) Epoch 46, validation: loss=0.05742, simple_loss=0.05049, pruned_loss=0.005198, audio_tagging_loss=0.02698, over 4681554.00 frames. 2023-11-28 19:56:14,887 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 19:56:33,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3647253.3333333335, ans=0.0 2023-11-28 19:56:34,504 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.009e+01 8.785e+01 9.507e+01 1.045e+02 2.026e+02, threshold=1.901e+02, percent-clipped=1.0 2023-11-28 19:56:37,816 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.00 vs. limit=15.0 2023-11-28 19:56:39,375 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547100 2023-11-28 19:56:44,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3647320.0, ans=0.125 2023-11-28 19:57:01,577 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:57:01,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3647386.6666666665, ans=0.125 2023-11-28 19:57:16,618 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6050, loss[loss=0.0732, simple_loss=0.1023, pruned_loss=0.01588, audio_tagging_loss=0.006151, over 14996.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08991, pruned_loss=0.01205, audio_tagging_loss=0.008409, over 3046246.85 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:57:18,565 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.53 vs. limit=22.5 2023-11-28 19:57:41,465 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547150 2023-11-28 19:57:44,451 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-11-28 19:57:53,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3647720.0, ans=0.0 2023-11-28 19:57:57,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3647720.0, ans=0.0 2023-11-28 19:58:06,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3647786.6666666665, ans=0.2 2023-11-28 19:58:18,283 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6100, loss[loss=0.05334, simple_loss=0.07605, pruned_loss=0.007957, audio_tagging_loss=0.007358, over 16449.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08878, pruned_loss=0.01189, audio_tagging_loss=0.008401, over 3041001.75 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:58:24,972 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:58:38,572 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=12.0 2023-11-28 19:58:39,064 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.974e+01 9.540e+01 1.034e+02 1.321e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 19:58:41,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3647986.6666666665, ans=0.2 2023-11-28 19:58:42,874 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547200 2023-11-28 19:58:49,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2023-11-28 19:58:54,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3647986.6666666665, ans=0.125 2023-11-28 19:59:06,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3648053.3333333335, ans=0.125 2023-11-28 19:59:09,513 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.87 vs. limit=15.0 2023-11-28 19:59:20,941 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6150, loss[loss=0.05949, simple_loss=0.08415, pruned_loss=0.009453, audio_tagging_loss=0.007962, over 15372.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08897, pruned_loss=0.01207, audio_tagging_loss=0.008446, over 3036487.89 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:59:28,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3648186.6666666665, ans=0.0 2023-11-28 19:59:30,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3648186.6666666665, ans=0.0 2023-11-28 19:59:44,465 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2023-11-28 19:59:45,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3648320.0, ans=0.0 2023-11-28 19:59:46,027 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547250 2023-11-28 20:00:14,677 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.34 vs. limit=22.5 2023-11-28 20:00:15,279 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:00:19,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3648453.3333333335, ans=0.0 2023-11-28 20:00:19,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3648453.3333333335, ans=0.0 2023-11-28 20:00:21,962 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6200, loss[loss=0.06386, simple_loss=0.08008, pruned_loss=0.01329, audio_tagging_loss=0.01053, over 15036.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.0888, pruned_loss=0.01202, audio_tagging_loss=0.008482, over 3032406.61 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:00:33,688 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.02 vs. limit=15.0 2023-11-28 20:00:42,988 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 8.739e+01 9.716e+01 1.027e+02 1.338e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-28 20:00:47,153 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547300 2023-11-28 20:01:23,808 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6250, loss[loss=0.05581, simple_loss=0.08061, pruned_loss=0.007426, audio_tagging_loss=0.008081, over 15334.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08828, pruned_loss=0.01185, audio_tagging_loss=0.008619, over 3033972.99 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:01:36,555 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=22.5 2023-11-28 20:01:47,660 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547350 2023-11-28 20:01:51,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3648986.6666666665, ans=0.125 2023-11-28 20:02:17,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3649120.0, ans=0.125 2023-11-28 20:02:25,155 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6300, loss[loss=0.06706, simple_loss=0.09071, pruned_loss=0.01238, audio_tagging_loss=0.009324, over 16246.00 frames. ], tot_loss[loss=0.06416, simple_loss=0.0873, pruned_loss=0.01179, audio_tagging_loss=0.008712, over 3043019.86 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:02:26,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3649186.6666666665, ans=0.125 2023-11-28 20:02:45,248 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.610e+01 8.912e+01 9.440e+01 1.014e+02 1.345e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 20:02:49,533 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547400 2023-11-28 20:03:21,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3649453.3333333335, ans=0.015 2023-11-28 20:03:24,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3649453.3333333335, ans=0.2 2023-11-28 20:03:26,072 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6350, loss[loss=0.07117, simple_loss=0.09494, pruned_loss=0.01318, audio_tagging_loss=0.01052, over 15683.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08858, pruned_loss=0.01212, audio_tagging_loss=0.008641, over 3039962.70 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:03:27,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.57 vs. limit=15.0 2023-11-28 20:03:51,501 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547450 2023-11-28 20:04:07,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3649720.0, ans=0.04949747468305833 2023-11-28 20:04:27,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3649853.3333333335, ans=0.125 2023-11-28 20:04:28,208 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6400, loss[loss=0.07563, simple_loss=0.1034, pruned_loss=0.01549, audio_tagging_loss=0.008435, over 14171.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08872, pruned_loss=0.01219, audio_tagging_loss=0.00872, over 3041295.89 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 20:04:28,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3649853.3333333335, ans=0.0 2023-11-28 20:04:49,344 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 9.001e+01 9.641e+01 1.036e+02 1.339e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-28 20:04:52,987 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547500 2023-11-28 20:05:16,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3650120.0, ans=0.0 2023-11-28 20:05:20,705 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=15.0 2023-11-28 20:05:25,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3650120.0, ans=0.0 2023-11-28 20:05:30,121 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6450, loss[loss=0.05985, simple_loss=0.07775, pruned_loss=0.01097, audio_tagging_loss=0.01001, over 15244.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08857, pruned_loss=0.01218, audio_tagging_loss=0.008803, over 3039499.33 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 20:05:38,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3650186.6666666665, ans=0.2 2023-11-28 20:05:45,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3650253.3333333335, ans=0.0 2023-11-28 20:05:48,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=15.0 2023-11-28 20:05:54,492 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547550 2023-11-28 20:06:02,868 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.25 vs. limit=15.0 2023-11-28 20:06:10,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3650386.6666666665, ans=0.125 2023-11-28 20:06:21,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3650453.3333333335, ans=0.04949747468305833 2023-11-28 20:06:30,895 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6500, loss[loss=0.06628, simple_loss=0.09002, pruned_loss=0.01345, audio_tagging_loss=0.007814, over 14037.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08844, pruned_loss=0.01202, audio_tagging_loss=0.008775, over 3035642.91 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:06:45,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3650586.6666666665, ans=0.125 2023-11-28 20:06:47,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3650586.6666666665, ans=0.125 2023-11-28 20:06:54,006 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 8.749e+01 9.341e+01 9.951e+01 1.412e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 20:06:55,672 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:06:56,629 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547600 2023-11-28 20:07:33,357 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6550, loss[loss=0.05808, simple_loss=0.08073, pruned_loss=0.009809, audio_tagging_loss=0.007911, over 15150.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08775, pruned_loss=0.01199, audio_tagging_loss=0.008708, over 3039546.59 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:07:38,065 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=15.0 2023-11-28 20:07:58,236 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547650 2023-11-28 20:08:19,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3651053.3333333335, ans=0.2 2023-11-28 20:08:20,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3651053.3333333335, ans=10.0 2023-11-28 20:08:21,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3651120.0, ans=0.125 2023-11-28 20:08:35,080 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.63 vs. limit=22.5 2023-11-28 20:08:35,782 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6600, loss[loss=0.06248, simple_loss=0.08598, pruned_loss=0.01124, audio_tagging_loss=0.008251, over 15316.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08848, pruned_loss=0.01216, audio_tagging_loss=0.008621, over 3044427.75 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:08:39,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3651186.6666666665, ans=0.0 2023-11-28 20:08:58,744 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.741e+01 8.944e+01 9.598e+01 1.039e+02 1.454e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 20:08:59,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3651253.3333333335, ans=0.125 2023-11-28 20:09:01,291 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547700 2023-11-28 20:09:04,191 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.43 vs. limit=22.5 2023-11-28 20:09:08,844 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=15.0 2023-11-28 20:09:15,318 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.80 vs. limit=15.0 2023-11-28 20:09:21,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3651386.6666666665, ans=0.125 2023-11-28 20:09:25,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3651453.3333333335, ans=0.0 2023-11-28 20:09:38,402 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6650, loss[loss=0.04608, simple_loss=0.05889, pruned_loss=0.00719, audio_tagging_loss=0.009449, over 14796.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08905, pruned_loss=0.01222, audio_tagging_loss=0.008571, over 3041411.35 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:09:38,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3651520.0, ans=0.125 2023-11-28 20:09:39,034 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.52 vs. limit=15.0 2023-11-28 20:09:44,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3651520.0, ans=0.09899494936611666 2023-11-28 20:10:03,170 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547750 2023-11-28 20:10:03,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3651653.3333333335, ans=0.125 2023-11-28 20:10:04,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3651653.3333333335, ans=15.0 2023-11-28 20:10:09,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3651653.3333333335, ans=0.07 2023-11-28 20:10:14,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3651720.0, ans=0.0 2023-11-28 20:10:29,179 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.60 vs. limit=22.5 2023-11-28 20:10:39,462 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6700, loss[loss=0.0636, simple_loss=0.08577, pruned_loss=0.01131, audio_tagging_loss=0.0094, over 15224.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08953, pruned_loss=0.01219, audio_tagging_loss=0.0085, over 3046466.03 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:10:43,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3651853.3333333335, ans=0.125 2023-11-28 20:11:02,321 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.131e+01 8.565e+01 9.256e+01 9.960e+01 1.372e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-28 20:11:04,735 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547800 2023-11-28 20:11:04,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3651986.6666666665, ans=0.125 2023-11-28 20:11:18,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3652053.3333333335, ans=0.125 2023-11-28 20:11:29,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3652120.0, ans=0.125 2023-11-28 20:11:39,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3652120.0, ans=0.1 2023-11-28 20:11:40,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3652186.6666666665, ans=0.1 2023-11-28 20:11:42,302 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6750, loss[loss=0.06652, simple_loss=0.09233, pruned_loss=0.01365, audio_tagging_loss=0.006698, over 16546.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08999, pruned_loss=0.0123, audio_tagging_loss=0.008466, over 3041704.39 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:12:06,727 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547850 2023-11-28 20:12:09,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3652320.0, ans=0.0 2023-11-28 20:12:13,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3652320.0, ans=0.0 2023-11-28 20:12:17,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3652386.6666666665, ans=0.1 2023-11-28 20:12:41,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3652453.3333333335, ans=0.1 2023-11-28 20:12:43,527 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6800, loss[loss=0.08345, simple_loss=0.1227, pruned_loss=0.01339, audio_tagging_loss=0.008705, over 15707.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.0902, pruned_loss=0.01229, audio_tagging_loss=0.008345, over 3044418.48 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 20:13:02,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3652586.6666666665, ans=0.125 2023-11-28 20:13:05,390 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.899e+01 8.847e+01 9.241e+01 1.022e+02 1.257e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-28 20:13:07,855 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547900 2023-11-28 20:13:45,157 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6850, loss[loss=0.06143, simple_loss=0.07913, pruned_loss=0.01079, audio_tagging_loss=0.01107, over 15266.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08966, pruned_loss=0.0122, audio_tagging_loss=0.008493, over 3043490.68 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 20:14:06,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=15.0 2023-11-28 20:14:10,187 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547950 2023-11-28 20:14:22,403 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.05 vs. limit=15.0 2023-11-28 20:14:46,447 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6900, loss[loss=0.05186, simple_loss=0.06942, pruned_loss=0.006468, audio_tagging_loss=0.01068, over 15009.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08947, pruned_loss=0.01206, audio_tagging_loss=0.008564, over 3041402.88 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 20:14:49,402 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.74 vs. limit=10.0 2023-11-28 20:14:52,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3653186.6666666665, ans=0.125 2023-11-28 20:15:00,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3653253.3333333335, ans=0.0 2023-11-28 20:15:04,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3653253.3333333335, ans=0.0 2023-11-28 20:15:05,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3653253.3333333335, ans=0.125 2023-11-28 20:15:09,733 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 9.041e+01 9.811e+01 1.026e+02 3.153e+02, threshold=1.962e+02, percent-clipped=1.0 2023-11-28 20:15:11,085 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548000 2023-11-28 20:15:13,141 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-548000.pt 2023-11-28 20:15:24,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3653320.0, ans=0.0 2023-11-28 20:15:26,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3653386.6666666665, ans=0.0 2023-11-28 20:15:34,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3653386.6666666665, ans=0.125 2023-11-28 20:15:39,562 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 20:15:39,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3653453.3333333335, ans=0.0 2023-11-28 20:15:41,408 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.21 vs. limit=15.0 2023-11-28 20:15:43,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3653453.3333333335, ans=0.0 2023-11-28 20:15:50,609 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6950, loss[loss=0.05173, simple_loss=0.07053, pruned_loss=0.006338, audio_tagging_loss=0.01012, over 15886.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08861, pruned_loss=0.01192, audio_tagging_loss=0.008606, over 3041380.86 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:15:54,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3653520.0, ans=0.0 2023-11-28 20:15:59,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3653520.0, ans=0.125 2023-11-28 20:16:12,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3653586.6666666665, ans=0.1 2023-11-28 20:16:14,717 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548050 2023-11-28 20:16:21,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3653653.3333333335, ans=0.125 2023-11-28 20:16:27,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3653720.0, ans=10.0 2023-11-28 20:16:51,969 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7000, loss[loss=0.05352, simple_loss=0.07155, pruned_loss=0.008729, audio_tagging_loss=0.009018, over 15413.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08861, pruned_loss=0.01187, audio_tagging_loss=0.008677, over 3041119.30 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:16:55,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3653853.3333333335, ans=0.125 2023-11-28 20:17:01,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3653853.3333333335, ans=0.125 2023-11-28 20:17:15,187 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.929e+01 9.464e+01 1.026e+02 1.498e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 20:17:15,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3653986.6666666665, ans=0.1 2023-11-28 20:17:16,516 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548100 2023-11-28 20:17:23,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3653986.6666666665, ans=0.125 2023-11-28 20:17:53,516 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7050, loss[loss=0.05476, simple_loss=0.07259, pruned_loss=0.009575, audio_tagging_loss=0.008885, over 14239.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08797, pruned_loss=0.01181, audio_tagging_loss=0.008698, over 3040033.51 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:18:18,730 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548150 2023-11-28 20:18:30,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3654386.6666666665, ans=0.2 2023-11-28 20:18:36,274 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.45 vs. limit=22.5 2023-11-28 20:18:42,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3654453.3333333335, ans=0.1 2023-11-28 20:18:56,454 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7100, loss[loss=0.05583, simple_loss=0.08437, pruned_loss=0.007758, audio_tagging_loss=0.005887, over 15743.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.0884, pruned_loss=0.01178, audio_tagging_loss=0.008748, over 3037252.94 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:19:02,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3654520.0, ans=0.125 2023-11-28 20:19:07,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3654586.6666666665, ans=0.1 2023-11-28 20:19:11,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3654586.6666666665, ans=0.125 2023-11-28 20:19:19,320 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 9.027e+01 9.544e+01 1.031e+02 1.344e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-28 20:19:20,643 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548200 2023-11-28 20:19:22,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3654653.3333333335, ans=0.125 2023-11-28 20:19:23,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3654653.3333333335, ans=0.0 2023-11-28 20:19:34,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3654720.0, ans=0.1 2023-11-28 20:19:58,801 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7150, loss[loss=0.06519, simple_loss=0.09654, pruned_loss=0.008896, audio_tagging_loss=0.008023, over 15055.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08913, pruned_loss=0.01194, audio_tagging_loss=0.008762, over 3042755.02 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:20:01,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3654853.3333333335, ans=0.1 2023-11-28 20:20:05,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3654853.3333333335, ans=0.1 2023-11-28 20:20:22,538 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548250 2023-11-28 20:20:25,868 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2023-11-28 20:20:29,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3654986.6666666665, ans=0.0 2023-11-28 20:20:29,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3654986.6666666665, ans=0.125 2023-11-28 20:20:59,402 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7200, loss[loss=0.06034, simple_loss=0.08478, pruned_loss=0.007322, audio_tagging_loss=0.01063, over 15365.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08876, pruned_loss=0.01189, audio_tagging_loss=0.008848, over 3035474.77 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:21:01,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3655186.6666666665, ans=0.1 2023-11-28 20:21:04,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3655186.6666666665, ans=0.125 2023-11-28 20:21:12,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3655253.3333333335, ans=0.1 2023-11-28 20:21:14,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3655253.3333333335, ans=0.125 2023-11-28 20:21:17,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3655253.3333333335, ans=0.125 2023-11-28 20:21:24,326 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 9.059e+01 9.674e+01 1.051e+02 1.523e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 20:21:24,460 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548300 2023-11-28 20:21:33,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3655320.0, ans=0.0 2023-11-28 20:21:35,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3655386.6666666665, ans=0.04949747468305833 2023-11-28 20:21:39,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3655386.6666666665, ans=0.125 2023-11-28 20:21:41,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3655386.6666666665, ans=0.125 2023-11-28 20:21:44,454 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.27 vs. limit=15.0 2023-11-28 20:21:57,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3655453.3333333335, ans=0.125 2023-11-28 20:22:01,304 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7250, loss[loss=0.07098, simple_loss=0.0958, pruned_loss=0.01337, audio_tagging_loss=0.009718, over 16133.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08956, pruned_loss=0.01191, audio_tagging_loss=0.008898, over 3039098.08 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:22:17,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3655586.6666666665, ans=0.1 2023-11-28 20:22:26,046 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548350 2023-11-28 20:22:33,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3655653.3333333335, ans=0.0 2023-11-28 20:22:34,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3655653.3333333335, ans=0.2 2023-11-28 20:22:36,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3655653.3333333335, ans=0.0 2023-11-28 20:22:45,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3655720.0, ans=0.1 2023-11-28 20:22:55,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3655786.6666666665, ans=0.125 2023-11-28 20:23:03,255 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7300, loss[loss=0.06627, simple_loss=0.08804, pruned_loss=0.01307, audio_tagging_loss=0.009178, over 15616.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.0895, pruned_loss=0.01203, audio_tagging_loss=0.008828, over 3039969.00 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:23:03,606 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:23:11,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2023-11-28 20:23:19,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3655920.0, ans=0.0 2023-11-28 20:23:27,658 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.253e+01 8.815e+01 9.477e+01 1.035e+02 1.367e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 20:23:27,807 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548400 2023-11-28 20:23:33,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3655986.6666666665, ans=0.125 2023-11-28 20:23:34,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3655986.6666666665, ans=0.125 2023-11-28 20:23:57,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3656120.0, ans=0.0 2023-11-28 20:23:57,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.06 vs. limit=15.0 2023-11-28 20:24:04,863 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7350, loss[loss=0.06972, simple_loss=0.09648, pruned_loss=0.01543, audio_tagging_loss=0.006052, over 15106.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.0893, pruned_loss=0.01206, audio_tagging_loss=0.008703, over 3037800.07 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:24:05,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3656186.6666666665, ans=0.125 2023-11-28 20:24:16,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3656253.3333333335, ans=0.125 2023-11-28 20:24:29,597 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548450 2023-11-28 20:24:35,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.43 vs. limit=5.0 2023-11-28 20:24:47,264 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.02 vs. limit=12.0 2023-11-28 20:25:06,631 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7400, loss[loss=0.06663, simple_loss=0.09422, pruned_loss=0.01214, audio_tagging_loss=0.007375, over 15611.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08925, pruned_loss=0.01201, audio_tagging_loss=0.008584, over 3045921.10 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:25:07,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3656520.0, ans=0.125 2023-11-28 20:25:11,085 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.08 vs. limit=15.0 2023-11-28 20:25:15,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-11-28 20:25:26,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3656586.6666666665, ans=0.1 2023-11-28 20:25:30,864 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.703e+01 8.862e+01 9.547e+01 1.020e+02 1.427e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-28 20:25:30,986 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548500 2023-11-28 20:25:44,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3656720.0, ans=0.125 2023-11-28 20:25:55,321 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.32 vs. limit=15.0 2023-11-28 20:25:55,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3656786.6666666665, ans=10.0 2023-11-28 20:26:02,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3656786.6666666665, ans=0.1 2023-11-28 20:26:07,297 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7450, loss[loss=0.06448, simple_loss=0.08727, pruned_loss=0.01081, audio_tagging_loss=0.01004, over 16533.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.0888, pruned_loss=0.01186, audio_tagging_loss=0.008647, over 3053083.18 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:26:15,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.86 vs. limit=15.0 2023-11-28 20:26:31,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3656986.6666666665, ans=0.125 2023-11-28 20:26:32,852 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548550 2023-11-28 20:26:44,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3657053.3333333335, ans=0.125 2023-11-28 20:26:45,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3657053.3333333335, ans=0.125 2023-11-28 20:26:49,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3657053.3333333335, ans=0.2 2023-11-28 20:26:52,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3657053.3333333335, ans=0.125 2023-11-28 20:27:02,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3657120.0, ans=0.125 2023-11-28 20:27:08,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3657186.6666666665, ans=0.0 2023-11-28 20:27:09,840 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7500, loss[loss=0.0737, simple_loss=0.1116, pruned_loss=0.01213, audio_tagging_loss=0.005751, over 15057.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.0885, pruned_loss=0.01174, audio_tagging_loss=0.008592, over 3047096.26 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:27:17,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3657186.6666666665, ans=0.0 2023-11-28 20:27:34,129 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.864e+01 9.552e+01 1.022e+02 1.615e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 20:27:34,307 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548600 2023-11-28 20:27:39,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3657320.0, ans=0.2 2023-11-28 20:27:44,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3657320.0, ans=0.0 2023-11-28 20:27:50,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3657386.6666666665, ans=0.05 2023-11-28 20:28:12,526 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7550, loss[loss=0.07767, simple_loss=0.1009, pruned_loss=0.01818, audio_tagging_loss=0.009053, over 14945.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.0888, pruned_loss=0.01171, audio_tagging_loss=0.008561, over 3048704.15 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:28:26,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3657586.6666666665, ans=0.125 2023-11-28 20:28:37,504 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548650 2023-11-28 20:28:44,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3657653.3333333335, ans=0.07 2023-11-28 20:29:00,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3657786.6666666665, ans=0.1 2023-11-28 20:29:13,286 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7600, loss[loss=0.07319, simple_loss=0.1029, pruned_loss=0.01582, audio_tagging_loss=0.005915, over 15988.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08862, pruned_loss=0.01168, audio_tagging_loss=0.008505, over 3051807.73 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:29:32,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-11-28 20:29:35,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3657920.0, ans=0.1 2023-11-28 20:29:38,932 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.869e+01 9.036e+01 9.725e+01 1.055e+02 1.335e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-28 20:29:39,123 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548700 2023-11-28 20:29:42,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3657986.6666666665, ans=0.09899494936611666 2023-11-28 20:29:52,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.93 vs. limit=15.0 2023-11-28 20:30:08,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3658120.0, ans=0.0 2023-11-28 20:30:15,861 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7650, loss[loss=0.06421, simple_loss=0.09109, pruned_loss=0.01131, audio_tagging_loss=0.007353, over 15860.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08932, pruned_loss=0.01186, audio_tagging_loss=0.008527, over 3057014.27 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:30:35,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3658253.3333333335, ans=0.125 2023-11-28 20:30:40,711 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548750 2023-11-28 20:31:17,764 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7700, loss[loss=0.08742, simple_loss=0.1285, pruned_loss=0.0147, audio_tagging_loss=0.008486, over 15775.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.09005, pruned_loss=0.01185, audio_tagging_loss=0.00846, over 3061453.38 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:31:38,788 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=15.0 2023-11-28 20:31:42,445 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548800 2023-11-28 20:31:43,479 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.090e+01 9.039e+01 9.612e+01 1.044e+02 1.351e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 20:32:11,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3658786.6666666665, ans=0.2 2023-11-28 20:32:19,646 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7750, loss[loss=0.05983, simple_loss=0.07915, pruned_loss=0.01119, audio_tagging_loss=0.009065, over 15015.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08944, pruned_loss=0.01179, audio_tagging_loss=0.008547, over 3067762.24 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:32:44,848 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548850 2023-11-28 20:32:52,827 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.34 vs. limit=22.5 2023-11-28 20:32:53,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3658986.6666666665, ans=0.125 2023-11-28 20:32:53,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3658986.6666666665, ans=0.1 2023-11-28 20:32:56,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3659053.3333333335, ans=0.125 2023-11-28 20:33:14,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3659120.0, ans=0.09899494936611666 2023-11-28 20:33:22,014 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7800, loss[loss=0.07491, simple_loss=0.1084, pruned_loss=0.01325, audio_tagging_loss=0.007463, over 15062.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08902, pruned_loss=0.01178, audio_tagging_loss=0.008618, over 3058142.69 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:33:27,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3659186.6666666665, ans=0.07 2023-11-28 20:33:34,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3659253.3333333335, ans=0.125 2023-11-28 20:33:41,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3659253.3333333335, ans=0.125 2023-11-28 20:33:47,407 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548900 2023-11-28 20:33:48,455 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.914e+01 9.757e+01 1.064e+02 1.306e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-28 20:33:52,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3659320.0, ans=0.05 2023-11-28 20:34:03,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3659386.6666666665, ans=0.125 2023-11-28 20:34:24,304 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7850, loss[loss=0.06181, simple_loss=0.08466, pruned_loss=0.01213, audio_tagging_loss=0.007354, over 14673.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08965, pruned_loss=0.01207, audio_tagging_loss=0.008663, over 3057245.54 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:34:24,522 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:34:28,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3659520.0, ans=0.2 2023-11-28 20:34:32,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3659520.0, ans=0.125 2023-11-28 20:34:39,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3659586.6666666665, ans=0.0 2023-11-28 20:34:49,088 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548950 2023-11-28 20:35:06,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3659720.0, ans=0.0 2023-11-28 20:35:15,039 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2023-11-28 20:35:19,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3659786.6666666665, ans=0.2 2023-11-28 20:35:24,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3659853.3333333335, ans=0.0 2023-11-28 20:35:25,333 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7900, loss[loss=0.07081, simple_loss=0.09695, pruned_loss=0.01273, audio_tagging_loss=0.009606, over 15753.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08897, pruned_loss=0.01205, audio_tagging_loss=0.008743, over 3056634.60 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:35:42,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3659920.0, ans=0.125 2023-11-28 20:35:49,687 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549000 2023-11-28 20:35:50,709 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.838e+01 9.083e+01 9.481e+01 1.023e+02 1.467e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 20:36:20,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3660120.0, ans=0.125 2023-11-28 20:36:26,824 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7950, loss[loss=0.04678, simple_loss=0.06176, pruned_loss=0.007316, audio_tagging_loss=0.008586, over 13927.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.0884, pruned_loss=0.01199, audio_tagging_loss=0.008893, over 3053808.47 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:36:27,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3660186.6666666665, ans=0.1 2023-11-28 20:36:29,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3660186.6666666665, ans=0.125 2023-11-28 20:36:39,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3660253.3333333335, ans=0.2 2023-11-28 20:36:44,727 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 20:36:47,527 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=22.5 2023-11-28 20:36:52,518 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549050 2023-11-28 20:36:55,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3660320.0, ans=0.125 2023-11-28 20:36:59,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3660320.0, ans=0.2 2023-11-28 20:37:02,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3660320.0, ans=0.1 2023-11-28 20:37:03,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3660386.6666666665, ans=0.125 2023-11-28 20:37:12,686 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=15.0 2023-11-28 20:37:29,001 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8000, loss[loss=0.05561, simple_loss=0.06489, pruned_loss=0.01146, audio_tagging_loss=0.01171, over 15375.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08774, pruned_loss=0.01185, audio_tagging_loss=0.008943, over 3047118.71 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:37:53,811 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549100 2023-11-28 20:37:54,809 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.078e+01 8.902e+01 9.396e+01 1.018e+02 1.315e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-28 20:38:05,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3660720.0, ans=0.0 2023-11-28 20:38:15,317 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=15.0 2023-11-28 20:38:16,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3660720.0, ans=0.025 2023-11-28 20:38:20,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3660786.6666666665, ans=0.125 2023-11-28 20:38:31,182 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8050, loss[loss=0.07939, simple_loss=0.1064, pruned_loss=0.01893, audio_tagging_loss=0.007268, over 15977.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08792, pruned_loss=0.01188, audio_tagging_loss=0.008973, over 3047451.69 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:38:34,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3660853.3333333335, ans=0.125 2023-11-28 20:38:35,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3660853.3333333335, ans=0.125 2023-11-28 20:38:47,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3660920.0, ans=0.125 2023-11-28 20:38:47,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3660920.0, ans=0.125 2023-11-28 20:38:50,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3660920.0, ans=0.0 2023-11-28 20:38:55,152 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549150 2023-11-28 20:39:14,963 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.44 vs. limit=10.0 2023-11-28 20:39:27,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3661120.0, ans=0.2 2023-11-28 20:39:32,463 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8100, loss[loss=0.0691, simple_loss=0.0987, pruned_loss=0.01164, audio_tagging_loss=0.008103, over 14068.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08866, pruned_loss=0.01205, audio_tagging_loss=0.008886, over 3044623.51 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:39:36,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3661186.6666666665, ans=0.0 2023-11-28 20:39:36,592 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.54 vs. limit=15.0 2023-11-28 20:39:56,933 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549200 2023-11-28 20:40:00,136 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.328e+01 9.129e+01 9.751e+01 1.053e+02 1.304e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-28 20:40:00,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3661320.0, ans=0.0 2023-11-28 20:40:06,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3661320.0, ans=0.0 2023-11-28 20:40:14,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3661386.6666666665, ans=0.125 2023-11-28 20:40:20,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3661453.3333333335, ans=0.0 2023-11-28 20:40:34,209 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8150, loss[loss=0.07569, simple_loss=0.1106, pruned_loss=0.01439, audio_tagging_loss=0.005976, over 15788.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08925, pruned_loss=0.01205, audio_tagging_loss=0.00869, over 3048055.87 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:40:57,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3661653.3333333335, ans=0.1 2023-11-28 20:40:58,848 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549250 2023-11-28 20:41:00,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3661653.3333333335, ans=0.125 2023-11-28 20:41:05,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3661653.3333333335, ans=0.125 2023-11-28 20:41:06,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3661653.3333333335, ans=0.125 2023-11-28 20:41:25,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3661786.6666666665, ans=0.0 2023-11-28 20:41:35,440 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8200, loss[loss=0.05951, simple_loss=0.07944, pruned_loss=0.008608, audio_tagging_loss=0.01119, over 15233.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08948, pruned_loss=0.01205, audio_tagging_loss=0.008581, over 3049230.25 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:41:36,732 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 20:41:45,242 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2023-11-28 20:41:54,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3661920.0, ans=0.0 2023-11-28 20:41:59,785 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549300 2023-11-28 20:42:02,051 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.159e+01 8.719e+01 9.487e+01 1.033e+02 1.453e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 20:42:33,998 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:42:34,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3662120.0, ans=0.125 2023-11-28 20:42:36,675 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8250, loss[loss=0.06088, simple_loss=0.0846, pruned_loss=0.01003, audio_tagging_loss=0.00855, over 16028.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08899, pruned_loss=0.01206, audio_tagging_loss=0.008541, over 3048575.59 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:42:46,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3662186.6666666665, ans=0.0 2023-11-28 20:42:47,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3662253.3333333335, ans=0.125 2023-11-28 20:43:00,574 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549350 2023-11-28 20:43:03,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3662320.0, ans=0.125 2023-11-28 20:43:21,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3662386.6666666665, ans=0.1 2023-11-28 20:43:28,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3662453.3333333335, ans=0.0 2023-11-28 20:43:31,928 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:43:37,405 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8300, loss[loss=0.07376, simple_loss=0.1019, pruned_loss=0.01399, audio_tagging_loss=0.008796, over 15181.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08884, pruned_loss=0.012, audio_tagging_loss=0.00867, over 3049296.52 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:43:48,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3662586.6666666665, ans=0.125 2023-11-28 20:43:48,899 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2023-11-28 20:44:02,497 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549400 2023-11-28 20:44:05,010 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.090e+01 9.200e+01 9.674e+01 1.037e+02 1.231e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 20:44:39,618 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8350, loss[loss=0.06048, simple_loss=0.08732, pruned_loss=0.00852, audio_tagging_loss=0.008302, over 15710.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08893, pruned_loss=0.01199, audio_tagging_loss=0.008597, over 3049187.77 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:45:04,376 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549450 2023-11-28 20:45:40,918 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8400, loss[loss=0.08686, simple_loss=0.1288, pruned_loss=0.01633, audio_tagging_loss=0.006103, over 16059.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08858, pruned_loss=0.01189, audio_tagging_loss=0.008564, over 3050818.78 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:45:52,655 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.21 vs. limit=10.0 2023-11-28 20:45:54,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3663253.3333333335, ans=0.125 2023-11-28 20:46:05,548 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549500 2023-11-28 20:46:07,781 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.367e+01 8.725e+01 9.379e+01 9.933e+01 1.253e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 20:46:08,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3663320.0, ans=0.2 2023-11-28 20:46:41,985 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-11-28 20:46:42,572 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8450, loss[loss=0.07159, simple_loss=0.09868, pruned_loss=0.0113, audio_tagging_loss=0.01095, over 15177.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08864, pruned_loss=0.01181, audio_tagging_loss=0.008612, over 3044735.30 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:46:49,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3663520.0, ans=0.125 2023-11-28 20:47:04,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3663586.6666666665, ans=0.125 2023-11-28 20:47:07,149 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549550 2023-11-28 20:47:20,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3663720.0, ans=0.125 2023-11-28 20:47:21,595 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:47:38,059 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.24 vs. limit=15.0 2023-11-28 20:47:41,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3663786.6666666665, ans=0.125 2023-11-28 20:47:44,300 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8500, loss[loss=0.04561, simple_loss=0.05062, pruned_loss=0.008912, audio_tagging_loss=0.01139, over 15805.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08828, pruned_loss=0.0119, audio_tagging_loss=0.008605, over 3041682.38 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:48:09,434 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549600 2023-11-28 20:48:11,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.09 vs. limit=22.5 2023-11-28 20:48:11,913 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.752e+01 9.470e+01 1.030e+02 1.283e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 20:48:15,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3663986.6666666665, ans=0.125 2023-11-28 20:48:31,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3664053.3333333335, ans=0.125 2023-11-28 20:48:38,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3664120.0, ans=0.125 2023-11-28 20:48:40,333 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.63 vs. limit=10.0 2023-11-28 20:48:46,234 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8550, loss[loss=0.06113, simple_loss=0.08423, pruned_loss=0.01003, audio_tagging_loss=0.008976, over 14532.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08815, pruned_loss=0.01194, audio_tagging_loss=0.008621, over 3041019.69 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:48:46,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3664186.6666666665, ans=0.125 2023-11-28 20:48:46,724 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.12 vs. limit=10.0 2023-11-28 20:48:50,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3664186.6666666665, ans=0.1 2023-11-28 20:49:10,985 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549650 2023-11-28 20:49:11,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3664320.0, ans=0.125 2023-11-28 20:49:19,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3664320.0, ans=0.125 2023-11-28 20:49:21,305 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.25 vs. limit=12.0 2023-11-28 20:49:26,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3664386.6666666665, ans=0.0 2023-11-28 20:49:41,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3664453.3333333335, ans=0.95 2023-11-28 20:49:45,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3664453.3333333335, ans=0.125 2023-11-28 20:49:47,957 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8600, loss[loss=0.08735, simple_loss=0.1079, pruned_loss=0.0239, audio_tagging_loss=0.009507, over 13662.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08848, pruned_loss=0.01202, audio_tagging_loss=0.008642, over 3045549.65 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:49:57,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3664520.0, ans=0.125 2023-11-28 20:50:00,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3664586.6666666665, ans=0.1 2023-11-28 20:50:01,031 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.34 vs. limit=10.0 2023-11-28 20:50:04,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3664586.6666666665, ans=0.0 2023-11-28 20:50:04,795 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.40 vs. limit=15.0 2023-11-28 20:50:12,683 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549700 2023-11-28 20:50:16,010 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.762e+01 8.961e+01 9.460e+01 1.024e+02 1.421e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-28 20:50:18,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3664653.3333333335, ans=0.125 2023-11-28 20:50:32,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3664720.0, ans=0.125 2023-11-28 20:50:34,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3664720.0, ans=0.125 2023-11-28 20:50:36,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3664786.6666666665, ans=0.125 2023-11-28 20:50:43,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3664786.6666666665, ans=0.1 2023-11-28 20:50:49,976 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8650, loss[loss=0.04179, simple_loss=0.05598, pruned_loss=0.005417, audio_tagging_loss=0.008379, over 15907.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08832, pruned_loss=0.01192, audio_tagging_loss=0.00868, over 3048277.00 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:50:56,136 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2023-11-28 20:50:59,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3664853.3333333335, ans=0.0 2023-11-28 20:50:59,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3664853.3333333335, ans=0.04949747468305833 2023-11-28 20:51:13,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3664920.0, ans=0.0 2023-11-28 20:51:15,609 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549750 2023-11-28 20:51:22,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3664986.6666666665, ans=0.125 2023-11-28 20:51:25,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3664986.6666666665, ans=0.0 2023-11-28 20:51:25,342 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.50 vs. limit=10.0 2023-11-28 20:51:28,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3665053.3333333335, ans=0.2 2023-11-28 20:51:34,881 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:51:44,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3665120.0, ans=0.1 2023-11-28 20:51:45,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3665120.0, ans=0.05 2023-11-28 20:51:51,528 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8700, loss[loss=0.06774, simple_loss=0.09041, pruned_loss=0.01487, audio_tagging_loss=0.007668, over 15573.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08938, pruned_loss=0.0121, audio_tagging_loss=0.008709, over 3050946.36 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:51:56,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3665186.6666666665, ans=0.0 2023-11-28 20:52:06,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3665253.3333333335, ans=0.125 2023-11-28 20:52:16,836 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549800 2023-11-28 20:52:19,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3665320.0, ans=0.125 2023-11-28 20:52:20,364 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.11 vs. limit=10.0 2023-11-28 20:52:20,547 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 8.859e+01 9.712e+01 1.046e+02 1.344e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 20:52:40,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3665453.3333333335, ans=0.0 2023-11-28 20:52:43,014 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.37 vs. limit=5.0 2023-11-28 20:52:44,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3665453.3333333335, ans=0.125 2023-11-28 20:52:53,865 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8750, loss[loss=0.05978, simple_loss=0.08787, pruned_loss=0.007938, audio_tagging_loss=0.007905, over 14644.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08927, pruned_loss=0.01211, audio_tagging_loss=0.008756, over 3055696.87 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:53:18,487 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549850 2023-11-28 20:53:21,388 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-11-28 20:53:41,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3665720.0, ans=0.125 2023-11-28 20:53:43,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3665786.6666666665, ans=0.0 2023-11-28 20:53:55,661 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8800, loss[loss=0.06703, simple_loss=0.08903, pruned_loss=0.01434, audio_tagging_loss=0.008167, over 15590.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09021, pruned_loss=0.01224, audio_tagging_loss=0.008808, over 3052660.59 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:54:19,695 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549900 2023-11-28 20:54:23,630 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.083e+01 9.097e+01 9.649e+01 1.028e+02 1.198e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 20:54:32,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3666053.3333333335, ans=0.125 2023-11-28 20:54:39,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3666053.3333333335, ans=0.2 2023-11-28 20:54:42,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3666053.3333333335, ans=0.125 2023-11-28 20:54:43,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3666120.0, ans=0.025 2023-11-28 20:54:44,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=15.0 2023-11-28 20:54:48,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3666120.0, ans=0.04949747468305833 2023-11-28 20:54:56,772 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8850, loss[loss=0.06612, simple_loss=0.08783, pruned_loss=0.01335, audio_tagging_loss=0.008859, over 15104.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09033, pruned_loss=0.01217, audio_tagging_loss=0.008904, over 3056359.87 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:55:09,141 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 20:55:11,968 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2023-11-28 20:55:17,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.02 vs. limit=15.0 2023-11-28 20:55:21,809 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549950 2023-11-28 20:55:22,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3666320.0, ans=0.125 2023-11-28 20:55:57,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3666520.0, ans=0.125 2023-11-28 20:55:58,472 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8900, loss[loss=0.06495, simple_loss=0.09299, pruned_loss=0.01175, audio_tagging_loss=0.0067, over 14777.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09042, pruned_loss=0.01224, audio_tagging_loss=0.0088, over 3051165.67 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 20:56:11,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3666586.6666666665, ans=10.0 2023-11-28 20:56:17,137 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:56:19,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3666586.6666666665, ans=0.125 2023-11-28 20:56:21,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3666653.3333333335, ans=0.125 2023-11-28 20:56:21,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3666653.3333333335, ans=0.1 2023-11-28 20:56:23,519 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550000 2023-11-28 20:56:25,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3666653.3333333335, ans=10.0 2023-11-28 20:56:25,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3666653.3333333335, ans=0.0 2023-11-28 20:56:27,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3666653.3333333335, ans=0.0 2023-11-28 20:56:29,512 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.959e+01 9.608e+01 1.039e+02 1.784e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 20:56:57,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3666786.6666666665, ans=0.2 2023-11-28 20:57:00,007 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8950, loss[loss=0.06934, simple_loss=0.09545, pruned_loss=0.0158, audio_tagging_loss=0.005827, over 15513.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08954, pruned_loss=0.01214, audio_tagging_loss=0.008661, over 3053745.24 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 20:57:00,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3666853.3333333335, ans=0.0 2023-11-28 20:57:00,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3666853.3333333335, ans=0.2 2023-11-28 20:57:02,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3666853.3333333335, ans=0.035 2023-11-28 20:57:17,649 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:57:17,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3666920.0, ans=10.0 2023-11-28 20:57:24,670 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550050 2023-11-28 20:57:28,117 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.04 vs. limit=10.0 2023-11-28 20:57:39,699 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:58:02,925 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9000, loss[loss=0.05528, simple_loss=0.07456, pruned_loss=0.009771, audio_tagging_loss=0.008223, over 14789.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08946, pruned_loss=0.01219, audio_tagging_loss=0.008678, over 3051361.20 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 20:58:02,928 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 20:58:24,473 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.3599, 4.3832, 4.1476, 4.2882], device='cuda:0') 2023-11-28 20:58:42,570 INFO [train_asr.py:1267] (0/4) Epoch 46, validation: loss=0.05897, simple_loss=0.05047, pruned_loss=0.005253, audio_tagging_loss=0.02848, over 4681554.00 frames. 2023-11-28 20:58:42,571 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 20:59:02,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3667253.3333333335, ans=0.125 2023-11-28 20:59:02,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3667253.3333333335, ans=0.125 2023-11-28 20:59:07,650 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550100 2023-11-28 20:59:13,415 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.853e+01 9.549e+01 1.044e+02 1.258e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 20:59:32,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3667453.3333333335, ans=0.0 2023-11-28 20:59:36,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3667453.3333333335, ans=0.125 2023-11-28 20:59:42,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3667453.3333333335, ans=0.125 2023-11-28 20:59:44,793 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9050, loss[loss=0.04633, simple_loss=0.06155, pruned_loss=0.007102, audio_tagging_loss=0.008456, over 18054.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08982, pruned_loss=0.01227, audio_tagging_loss=0.008527, over 3059769.72 frames. ], batch size: 70, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 20:59:49,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3667520.0, ans=0.1 2023-11-28 21:00:05,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3667586.6666666665, ans=0.125 2023-11-28 21:00:08,890 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550150 2023-11-28 21:00:20,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3667720.0, ans=0.0 2023-11-28 21:00:32,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3667720.0, ans=0.2 2023-11-28 21:00:46,671 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9100, loss[loss=0.07255, simple_loss=0.09296, pruned_loss=0.01385, audio_tagging_loss=0.01222, over 15326.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08969, pruned_loss=0.01224, audio_tagging_loss=0.008534, over 3058302.73 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:00:54,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3667853.3333333335, ans=0.125 2023-11-28 21:00:56,657 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=15.0 2023-11-28 21:01:07,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3667920.0, ans=0.0 2023-11-28 21:01:12,369 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550200 2023-11-28 21:01:16,582 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2023-11-28 21:01:18,388 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.247e+01 8.959e+01 9.673e+01 1.042e+02 1.442e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 21:01:30,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3668053.3333333335, ans=0.125 2023-11-28 21:01:33,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.52 vs. limit=22.5 2023-11-28 21:01:35,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3668120.0, ans=0.125 2023-11-28 21:01:41,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3668120.0, ans=0.2 2023-11-28 21:01:48,508 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9150, loss[loss=0.06932, simple_loss=0.1001, pruned_loss=0.01245, audio_tagging_loss=0.006831, over 14479.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08954, pruned_loss=0.01199, audio_tagging_loss=0.008433, over 3051714.45 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:02:13,329 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550250 2023-11-28 21:02:22,828 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:02:34,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3668386.6666666665, ans=0.125 2023-11-28 21:02:46,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3668453.3333333335, ans=0.0 2023-11-28 21:02:50,547 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9200, loss[loss=0.05384, simple_loss=0.07906, pruned_loss=0.006539, audio_tagging_loss=0.007772, over 15032.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08979, pruned_loss=0.012, audio_tagging_loss=0.008423, over 3049097.90 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:03:04,749 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2023-11-28 21:03:14,984 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550300 2023-11-28 21:03:21,341 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.689e+01 9.468e+01 1.009e+02 1.302e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 21:03:22,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3668653.3333333335, ans=22.5 2023-11-28 21:03:22,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3668653.3333333335, ans=0.0 2023-11-28 21:03:35,248 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2023-11-28 21:03:51,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3668853.3333333335, ans=0.125 2023-11-28 21:03:52,649 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9250, loss[loss=0.05835, simple_loss=0.0808, pruned_loss=0.009336, audio_tagging_loss=0.008617, over 15369.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08917, pruned_loss=0.01199, audio_tagging_loss=0.008475, over 3051365.40 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:04:02,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3668853.3333333335, ans=0.1 2023-11-28 21:04:17,038 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550350 2023-11-28 21:04:54,339 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9300, loss[loss=0.07153, simple_loss=0.1043, pruned_loss=0.01004, audio_tagging_loss=0.009336, over 14053.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08902, pruned_loss=0.01193, audio_tagging_loss=0.008666, over 3047605.97 frames. ], batch size: 52, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:05:00,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3669186.6666666665, ans=0.125 2023-11-28 21:05:17,028 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.33 vs. limit=6.0 2023-11-28 21:05:18,883 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550400 2023-11-28 21:05:26,622 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 8.959e+01 9.529e+01 1.028e+02 1.391e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 21:05:27,175 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.55 vs. limit=15.0 2023-11-28 21:05:35,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3669386.6666666665, ans=0.04949747468305833 2023-11-28 21:05:49,335 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.45 vs. limit=6.0 2023-11-28 21:05:51,691 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2023-11-28 21:05:56,377 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9350, loss[loss=0.07623, simple_loss=0.1112, pruned_loss=0.0121, audio_tagging_loss=0.008546, over 16679.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08845, pruned_loss=0.0119, audio_tagging_loss=0.008695, over 3045701.34 frames. ], batch size: 66, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:06:01,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3669520.0, ans=10.0 2023-11-28 21:06:20,880 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550450 2023-11-28 21:06:30,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3669653.3333333335, ans=0.1 2023-11-28 21:06:32,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3669720.0, ans=0.0 2023-11-28 21:06:34,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3669720.0, ans=0.125 2023-11-28 21:06:39,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3669720.0, ans=0.035 2023-11-28 21:06:57,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3669853.3333333335, ans=0.0 2023-11-28 21:06:58,215 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9400, loss[loss=0.07043, simple_loss=0.09864, pruned_loss=0.01127, audio_tagging_loss=0.009831, over 14722.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08921, pruned_loss=0.01186, audio_tagging_loss=0.00867, over 3051334.76 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:07:07,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3669853.3333333335, ans=0.125 2023-11-28 21:07:08,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3669853.3333333335, ans=0.2 2023-11-28 21:07:22,490 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550500 2023-11-28 21:07:30,104 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 9.168e+01 9.669e+01 1.024e+02 1.175e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 21:07:45,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3670053.3333333335, ans=0.125 2023-11-28 21:07:49,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3670120.0, ans=0.0 2023-11-28 21:07:58,083 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:07:59,922 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9450, loss[loss=0.06592, simple_loss=0.09281, pruned_loss=0.0109, audio_tagging_loss=0.008609, over 14171.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08916, pruned_loss=0.01183, audio_tagging_loss=0.008774, over 3049295.55 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:08:01,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3670186.6666666665, ans=0.125 2023-11-28 21:08:06,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2023-11-28 21:08:09,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3670186.6666666665, ans=0.125 2023-11-28 21:08:12,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3670253.3333333335, ans=0.0 2023-11-28 21:08:20,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3670253.3333333335, ans=0.0 2023-11-28 21:08:24,126 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550550 2023-11-28 21:08:25,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3670320.0, ans=0.0 2023-11-28 21:08:30,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3670320.0, ans=0.5 2023-11-28 21:08:40,704 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:08:49,784 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.35 vs. limit=15.0 2023-11-28 21:09:01,349 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9500, loss[loss=0.06854, simple_loss=0.0916, pruned_loss=0.01364, audio_tagging_loss=0.009101, over 15197.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09003, pruned_loss=0.01201, audio_tagging_loss=0.008783, over 3051191.48 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:09:01,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3670520.0, ans=0.125 2023-11-28 21:09:13,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3670586.6666666665, ans=0.1 2023-11-28 21:09:22,328 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:09:25,534 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550600 2023-11-28 21:09:25,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=12.0 2023-11-28 21:09:33,451 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 9.097e+01 9.748e+01 1.049e+02 1.377e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-28 21:09:35,177 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.49 vs. limit=22.5 2023-11-28 21:09:36,552 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.40 vs. limit=15.0 2023-11-28 21:09:57,754 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:10:03,516 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9550, loss[loss=0.04453, simple_loss=0.05499, pruned_loss=0.007844, audio_tagging_loss=0.009194, over 13865.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09055, pruned_loss=0.01208, audio_tagging_loss=0.008722, over 3050810.04 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:10:04,439 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.34 vs. limit=15.0 2023-11-28 21:10:09,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3670853.3333333335, ans=0.2 2023-11-28 21:10:27,539 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550650 2023-11-28 21:10:30,479 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.78 vs. limit=15.0 2023-11-28 21:10:32,860 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.98 vs. limit=10.0 2023-11-28 21:10:48,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3671053.3333333335, ans=0.125 2023-11-28 21:10:49,492 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.30 vs. limit=10.0 2023-11-28 21:10:55,030 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=15.0 2023-11-28 21:11:03,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3671186.6666666665, ans=0.125 2023-11-28 21:11:04,577 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9600, loss[loss=0.06739, simple_loss=0.08694, pruned_loss=0.01262, audio_tagging_loss=0.0113, over 15958.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09049, pruned_loss=0.012, audio_tagging_loss=0.008779, over 3051780.80 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:11:10,522 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=15.0 2023-11-28 21:11:29,133 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550700 2023-11-28 21:11:36,703 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.216e+01 8.932e+01 9.584e+01 1.005e+02 1.481e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 21:11:46,308 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2023-11-28 21:12:06,283 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9650, loss[loss=0.08648, simple_loss=0.1187, pruned_loss=0.0194, audio_tagging_loss=0.007707, over 15719.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09039, pruned_loss=0.01204, audio_tagging_loss=0.008768, over 3045939.44 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:12:22,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3671586.6666666665, ans=0.1 2023-11-28 21:12:31,733 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550750 2023-11-28 21:12:38,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3671653.3333333335, ans=0.125 2023-11-28 21:12:38,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3671653.3333333335, ans=0.1 2023-11-28 21:13:07,732 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9700, loss[loss=0.05616, simple_loss=0.08411, pruned_loss=0.00549, audio_tagging_loss=0.008617, over 15018.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09102, pruned_loss=0.01211, audio_tagging_loss=0.008525, over 3046640.67 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:13:12,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3671853.3333333335, ans=0.0 2023-11-28 21:13:25,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3671920.0, ans=0.125 2023-11-28 21:13:31,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3671986.6666666665, ans=0.0 2023-11-28 21:13:32,972 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550800 2023-11-28 21:13:40,240 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 9.004e+01 9.586e+01 1.046e+02 1.916e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 21:13:55,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3672053.3333333335, ans=0.125 2023-11-28 21:14:10,540 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9750, loss[loss=0.07274, simple_loss=0.1031, pruned_loss=0.01321, audio_tagging_loss=0.008009, over 15416.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.09013, pruned_loss=0.01197, audio_tagging_loss=0.008496, over 3044338.58 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:14:15,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3672186.6666666665, ans=0.07 2023-11-28 21:14:19,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3672186.6666666665, ans=0.125 2023-11-28 21:14:19,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3672186.6666666665, ans=0.0 2023-11-28 21:14:32,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3672253.3333333335, ans=0.125 2023-11-28 21:14:35,378 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550850 2023-11-28 21:14:35,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3672320.0, ans=0.125 2023-11-28 21:14:45,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3672320.0, ans=0.125 2023-11-28 21:14:49,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3672386.6666666665, ans=0.125 2023-11-28 21:15:11,890 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9800, loss[loss=0.06325, simple_loss=0.09189, pruned_loss=0.01153, audio_tagging_loss=0.005774, over 14544.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08866, pruned_loss=0.01169, audio_tagging_loss=0.008458, over 3047335.69 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:15:25,860 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.60 vs. limit=22.5 2023-11-28 21:15:36,404 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550900 2023-11-28 21:15:41,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3672653.3333333335, ans=0.125 2023-11-28 21:15:43,932 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.245e+01 8.839e+01 9.561e+01 1.020e+02 1.364e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 21:15:47,970 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.77 vs. limit=15.0 2023-11-28 21:16:07,178 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:16:08,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=3672786.6666666665, ans=0.1 2023-11-28 21:16:10,921 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:16:12,999 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9850, loss[loss=0.0829, simple_loss=0.1093, pruned_loss=0.01834, audio_tagging_loss=0.009897, over 14893.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08876, pruned_loss=0.01166, audio_tagging_loss=0.008479, over 3046644.29 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:16:17,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3672853.3333333335, ans=0.125 2023-11-28 21:16:27,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3672920.0, ans=0.125 2023-11-28 21:16:38,336 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550950 2023-11-28 21:16:47,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3672986.6666666665, ans=0.125 2023-11-28 21:16:52,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3673053.3333333335, ans=10.0 2023-11-28 21:17:02,439 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:17:08,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3673120.0, ans=0.0 2023-11-28 21:17:11,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3673120.0, ans=0.1 2023-11-28 21:17:14,280 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9900, loss[loss=0.05028, simple_loss=0.06576, pruned_loss=0.00931, audio_tagging_loss=0.008084, over 14899.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08914, pruned_loss=0.01177, audio_tagging_loss=0.008505, over 3043702.18 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:17:14,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3673186.6666666665, ans=0.0 2023-11-28 21:17:26,997 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=15.0 2023-11-28 21:17:29,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3673253.3333333335, ans=10.0 2023-11-28 21:17:40,003 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551000 2023-11-28 21:17:42,902 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:17:48,300 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.969e+01 9.517e+01 1.006e+02 1.259e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 21:17:48,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3673320.0, ans=0.125 2023-11-28 21:17:59,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3673386.6666666665, ans=0.125 2023-11-28 21:18:16,944 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9950, loss[loss=0.08822, simple_loss=0.1237, pruned_loss=0.02043, audio_tagging_loss=0.005937, over 17107.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.0888, pruned_loss=0.01193, audio_tagging_loss=0.008481, over 3047265.92 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:18:19,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3673520.0, ans=0.125 2023-11-28 21:18:24,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3673520.0, ans=0.2 2023-11-28 21:18:42,049 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551050 2023-11-28 21:18:46,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3673653.3333333335, ans=0.125 2023-11-28 21:18:49,649 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.91 vs. limit=22.5 2023-11-28 21:18:54,220 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2023-11-28 21:19:10,843 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.65 vs. limit=22.5 2023-11-28 21:19:13,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3673786.6666666665, ans=0.0 2023-11-28 21:19:16,776 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.85 vs. limit=6.0 2023-11-28 21:19:18,495 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10000, loss[loss=0.07837, simple_loss=0.1179, pruned_loss=0.01344, audio_tagging_loss=0.005983, over 15021.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08868, pruned_loss=0.01187, audio_tagging_loss=0.008577, over 3044181.97 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:19:18,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3673853.3333333335, ans=0.125 2023-11-28 21:19:20,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.02 vs. limit=22.5 2023-11-28 21:19:30,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3673920.0, ans=0.125 2023-11-28 21:19:43,698 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551100 2023-11-28 21:19:43,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3673986.6666666665, ans=0.0 2023-11-28 21:19:51,742 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 9.118e+01 9.727e+01 1.019e+02 1.264e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-28 21:19:52,479 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2023-11-28 21:20:09,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3674120.0, ans=0.125 2023-11-28 21:20:20,094 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10050, loss[loss=0.0661, simple_loss=0.07857, pruned_loss=0.01157, audio_tagging_loss=0.01525, over 15073.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08983, pruned_loss=0.01199, audio_tagging_loss=0.008492, over 3049643.31 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:20:21,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3674186.6666666665, ans=0.2 2023-11-28 21:20:21,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3674186.6666666665, ans=0.0 2023-11-28 21:20:46,177 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551150 2023-11-28 21:20:52,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3674320.0, ans=0.125 2023-11-28 21:20:52,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2023-11-28 21:21:05,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3674386.6666666665, ans=0.125 2023-11-28 21:21:18,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3674453.3333333335, ans=0.125 2023-11-28 21:21:22,607 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10100, loss[loss=0.07502, simple_loss=0.09691, pruned_loss=0.01725, audio_tagging_loss=0.00932, over 15045.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.09028, pruned_loss=0.01198, audio_tagging_loss=0.008527, over 3053473.60 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:21:27,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3674520.0, ans=0.1 2023-11-28 21:21:35,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3674586.6666666665, ans=0.125 2023-11-28 21:21:40,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3674586.6666666665, ans=0.0 2023-11-28 21:21:46,955 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551200 2023-11-28 21:21:55,384 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.986e+01 9.697e+01 1.060e+02 1.407e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 21:21:59,915 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2023-11-28 21:22:13,504 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:22:14,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3674786.6666666665, ans=0.125 2023-11-28 21:22:24,632 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10150, loss[loss=0.04637, simple_loss=0.06342, pruned_loss=0.005202, audio_tagging_loss=0.009459, over 15521.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.09012, pruned_loss=0.01202, audio_tagging_loss=0.008577, over 3054723.95 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:22:25,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3674853.3333333335, ans=0.07 2023-11-28 21:22:46,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3674920.0, ans=0.015 2023-11-28 21:22:47,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3674920.0, ans=0.125 2023-11-28 21:22:49,330 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551250 2023-11-28 21:22:54,629 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:23:26,853 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10200, loss[loss=0.08578, simple_loss=0.113, pruned_loss=0.02134, audio_tagging_loss=0.007941, over 15812.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.09014, pruned_loss=0.01201, audio_tagging_loss=0.008608, over 3059448.05 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:23:31,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3675186.6666666665, ans=0.125 2023-11-28 21:23:43,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3675253.3333333335, ans=0.0 2023-11-28 21:23:50,127 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:23:51,398 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551300 2023-11-28 21:24:00,126 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.427e+01 8.789e+01 9.521e+01 1.014e+02 1.270e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 21:24:10,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3675386.6666666665, ans=0.125 2023-11-28 21:24:28,417 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10250, loss[loss=0.08048, simple_loss=0.1226, pruned_loss=0.01145, audio_tagging_loss=0.007701, over 15529.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09085, pruned_loss=0.01228, audio_tagging_loss=0.008582, over 3061140.94 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:24:31,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3675520.0, ans=0.125 2023-11-28 21:24:36,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3675520.0, ans=0.125 2023-11-28 21:24:53,147 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551350 2023-11-28 21:24:53,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3675653.3333333335, ans=0.125 2023-11-28 21:25:20,881 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.03 vs. limit=10.0 2023-11-28 21:25:30,793 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10300, loss[loss=0.05822, simple_loss=0.08276, pruned_loss=0.00989, audio_tagging_loss=0.006955, over 14156.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09066, pruned_loss=0.01221, audio_tagging_loss=0.008588, over 3059594.42 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:25:36,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3675853.3333333335, ans=0.125 2023-11-28 21:25:39,146 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:25:54,978 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551400 2023-11-28 21:26:04,008 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.214e+01 9.006e+01 9.446e+01 1.010e+02 1.376e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 21:26:07,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.72 vs. limit=12.0 2023-11-28 21:26:09,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3676053.3333333335, ans=0.0 2023-11-28 21:26:32,630 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10350, loss[loss=0.05766, simple_loss=0.07203, pruned_loss=0.01185, audio_tagging_loss=0.009788, over 15264.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09076, pruned_loss=0.01226, audio_tagging_loss=0.008675, over 3060120.79 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:26:49,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3676253.3333333335, ans=0.2 2023-11-28 21:26:54,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3676253.3333333335, ans=0.125 2023-11-28 21:26:56,385 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551450 2023-11-28 21:27:00,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3676320.0, ans=0.05 2023-11-28 21:27:16,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3676386.6666666665, ans=0.0 2023-11-28 21:27:26,589 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-11-28 21:27:33,579 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10400, loss[loss=0.06775, simple_loss=0.09134, pruned_loss=0.01054, audio_tagging_loss=0.01154, over 14537.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09065, pruned_loss=0.01207, audio_tagging_loss=0.008721, over 3053590.08 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:27:34,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=15.0 2023-11-28 21:27:58,512 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551500 2023-11-28 21:28:07,099 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 8.953e+01 9.462e+01 1.003e+02 1.279e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-28 21:28:29,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.22 vs. limit=15.0 2023-11-28 21:28:35,107 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10450, loss[loss=0.07998, simple_loss=0.1077, pruned_loss=0.01813, audio_tagging_loss=0.007989, over 15767.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09046, pruned_loss=0.0121, audio_tagging_loss=0.008804, over 3043834.25 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:28:49,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3676920.0, ans=0.035 2023-11-28 21:29:00,196 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551550 2023-11-28 21:29:03,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3676986.6666666665, ans=0.2 2023-11-28 21:29:10,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3676986.6666666665, ans=0.125 2023-11-28 21:29:31,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3677120.0, ans=0.1 2023-11-28 21:29:37,890 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10500, loss[loss=0.0544, simple_loss=0.07108, pruned_loss=0.01031, audio_tagging_loss=0.008551, over 16729.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08937, pruned_loss=0.01188, audio_tagging_loss=0.008791, over 3040659.97 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:30:02,208 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551600 2023-11-28 21:30:11,078 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.780e+01 9.536e+01 1.016e+02 1.256e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 21:30:39,089 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10550, loss[loss=0.09778, simple_loss=0.1354, pruned_loss=0.02489, audio_tagging_loss=0.005169, over 15643.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08991, pruned_loss=0.01215, audio_tagging_loss=0.008579, over 3047287.58 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:30:40,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3677520.0, ans=0.0 2023-11-28 21:30:45,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3677520.0, ans=0.125 2023-11-28 21:31:04,442 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551650 2023-11-28 21:31:05,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3677653.3333333335, ans=0.0 2023-11-28 21:31:07,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3677653.3333333335, ans=0.125 2023-11-28 21:31:40,771 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10600, loss[loss=0.06088, simple_loss=0.08361, pruned_loss=0.01075, audio_tagging_loss=0.008327, over 15995.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08996, pruned_loss=0.01222, audio_tagging_loss=0.00852, over 3047274.63 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:32:05,987 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551700 2023-11-28 21:32:14,116 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 9.049e+01 9.693e+01 1.043e+02 1.339e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 21:32:18,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3678053.3333333335, ans=0.125 2023-11-28 21:32:23,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3678053.3333333335, ans=0.2 2023-11-28 21:32:40,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3678120.0, ans=0.125 2023-11-28 21:32:43,471 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10650, loss[loss=0.07295, simple_loss=0.09684, pruned_loss=0.01334, audio_tagging_loss=0.01119, over 15951.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.09013, pruned_loss=0.01213, audio_tagging_loss=0.008467, over 3048192.78 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:32:50,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3678186.6666666665, ans=0.0 2023-11-28 21:33:08,188 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551750 2023-11-28 21:33:14,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3678320.0, ans=0.1 2023-11-28 21:33:15,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3678320.0, ans=0.125 2023-11-28 21:33:34,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3678453.3333333335, ans=0.125 2023-11-28 21:33:45,324 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10700, loss[loss=0.06264, simple_loss=0.09519, pruned_loss=0.009542, audio_tagging_loss=0.005506, over 15039.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.09005, pruned_loss=0.012, audio_tagging_loss=0.008468, over 3048718.82 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:33:58,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3678586.6666666665, ans=0.1 2023-11-28 21:34:10,667 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551800 2023-11-28 21:34:10,742 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:34:19,745 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.713e+01 9.110e+01 9.632e+01 1.031e+02 2.472e+02, threshold=1.926e+02, percent-clipped=1.0 2023-11-28 21:34:29,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3678720.0, ans=0.1 2023-11-28 21:34:32,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3678720.0, ans=0.125 2023-11-28 21:34:48,472 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10750, loss[loss=0.07803, simple_loss=0.1031, pruned_loss=0.01874, audio_tagging_loss=0.007726, over 15371.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08997, pruned_loss=0.01191, audio_tagging_loss=0.008465, over 3052498.57 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:35:04,460 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:35:13,721 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551850 2023-11-28 21:35:28,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3679053.3333333335, ans=0.0 2023-11-28 21:35:46,889 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.11 vs. limit=15.0 2023-11-28 21:35:48,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3679186.6666666665, ans=0.0 2023-11-28 21:35:49,811 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10800, loss[loss=0.0716, simple_loss=0.1032, pruned_loss=0.01364, audio_tagging_loss=0.006342, over 15608.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08986, pruned_loss=0.01195, audio_tagging_loss=0.008443, over 3048790.59 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:36:10,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3679253.3333333335, ans=0.1 2023-11-28 21:36:15,201 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551900 2023-11-28 21:36:22,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=15.0 2023-11-28 21:36:24,366 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 8.859e+01 9.432e+01 1.041e+02 1.593e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 21:36:24,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3679320.0, ans=0.0 2023-11-28 21:36:38,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3679453.3333333335, ans=0.5 2023-11-28 21:36:47,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3679453.3333333335, ans=0.0 2023-11-28 21:36:51,878 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10850, loss[loss=0.04737, simple_loss=0.06054, pruned_loss=0.007907, audio_tagging_loss=0.009189, over 14189.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08935, pruned_loss=0.01179, audio_tagging_loss=0.008493, over 3049968.33 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:37:05,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3679586.6666666665, ans=0.2 2023-11-28 21:37:15,933 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=15.0 2023-11-28 21:37:16,610 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551950 2023-11-28 21:37:24,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3679653.3333333335, ans=0.125 2023-11-28 21:37:31,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3679720.0, ans=0.0 2023-11-28 21:37:50,018 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:37:53,407 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10900, loss[loss=0.04274, simple_loss=0.05628, pruned_loss=0.006711, audio_tagging_loss=0.007889, over 14367.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08907, pruned_loss=0.01171, audio_tagging_loss=0.008482, over 3043890.96 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:38:05,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3679920.0, ans=0.125 2023-11-28 21:38:14,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3679920.0, ans=0.1 2023-11-28 21:38:18,037 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552000 2023-11-28 21:38:19,486 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-552000.pt 2023-11-28 21:38:30,860 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.172e+01 9.027e+01 9.612e+01 1.023e+02 1.534e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 21:38:32,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3680053.3333333335, ans=0.0 2023-11-28 21:38:36,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.21 vs. limit=10.0 2023-11-28 21:38:39,363 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=12.0 2023-11-28 21:38:44,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3680120.0, ans=0.125 2023-11-28 21:38:54,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3680120.0, ans=0.125 2023-11-28 21:38:57,996 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10950, loss[loss=0.05566, simple_loss=0.07543, pruned_loss=0.009163, audio_tagging_loss=0.008787, over 14722.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08945, pruned_loss=0.01184, audio_tagging_loss=0.008453, over 3053918.91 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:39:04,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3680186.6666666665, ans=0.125 2023-11-28 21:39:10,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3680253.3333333335, ans=0.125 2023-11-28 21:39:13,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3680253.3333333335, ans=0.0 2023-11-28 21:39:14,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3680253.3333333335, ans=0.125 2023-11-28 21:39:20,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3680253.3333333335, ans=0.125 2023-11-28 21:39:23,238 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552050 2023-11-28 21:39:26,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3680320.0, ans=0.0 2023-11-28 21:39:30,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3680320.0, ans=0.5 2023-11-28 21:39:33,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3680386.6666666665, ans=0.0 2023-11-28 21:39:40,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3680386.6666666665, ans=0.0 2023-11-28 21:39:44,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3680386.6666666665, ans=0.125 2023-11-28 21:39:55,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3680453.3333333335, ans=0.0 2023-11-28 21:39:59,059 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11000, loss[loss=0.07164, simple_loss=0.09306, pruned_loss=0.01542, audio_tagging_loss=0.009693, over 14429.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08945, pruned_loss=0.01171, audio_tagging_loss=0.008498, over 3055415.49 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:40:05,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3680520.0, ans=0.0 2023-11-28 21:40:09,726 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:40:09,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3680520.0, ans=0.1 2023-11-28 21:40:11,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3680586.6666666665, ans=0.0 2023-11-28 21:40:15,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3680586.6666666665, ans=0.04949747468305833 2023-11-28 21:40:20,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3680586.6666666665, ans=0.125 2023-11-28 21:40:20,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3680586.6666666665, ans=0.125 2023-11-28 21:40:23,655 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552100 2023-11-28 21:40:29,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2023-11-28 21:40:33,355 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.817e+01 9.387e+01 9.947e+01 1.401e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 21:40:39,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3680720.0, ans=0.2 2023-11-28 21:40:43,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3680720.0, ans=0.07 2023-11-28 21:41:01,333 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11050, loss[loss=0.08572, simple_loss=0.1189, pruned_loss=0.0179, audio_tagging_loss=0.008368, over 15014.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08951, pruned_loss=0.01187, audio_tagging_loss=0.008624, over 3051922.18 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:41:04,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3680853.3333333335, ans=0.125 2023-11-28 21:41:22,235 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.85 vs. limit=15.0 2023-11-28 21:41:23,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3680920.0, ans=0.125 2023-11-28 21:41:25,808 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552150 2023-11-28 21:41:28,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3680986.6666666665, ans=0.2 2023-11-28 21:41:30,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3680986.6666666665, ans=0.125 2023-11-28 21:41:30,903 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.58 vs. limit=15.0 2023-11-28 21:41:39,856 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2023-11-28 21:41:46,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3681053.3333333335, ans=0.125 2023-11-28 21:41:50,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3681120.0, ans=0.04949747468305833 2023-11-28 21:41:51,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3681120.0, ans=0.0 2023-11-28 21:42:02,718 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11100, loss[loss=0.06927, simple_loss=0.102, pruned_loss=0.01232, audio_tagging_loss=0.005977, over 16367.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08929, pruned_loss=0.01191, audio_tagging_loss=0.008701, over 3055093.81 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:42:17,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.58 vs. limit=15.0 2023-11-28 21:42:19,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2023-11-28 21:42:23,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3681253.3333333335, ans=0.0 2023-11-28 21:42:27,678 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552200 2023-11-28 21:42:37,751 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.621e+01 8.965e+01 9.771e+01 1.048e+02 1.332e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-28 21:42:45,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3681386.6666666665, ans=0.125 2023-11-28 21:42:56,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3681453.3333333335, ans=0.125 2023-11-28 21:42:57,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3681453.3333333335, ans=0.125 2023-11-28 21:43:01,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3681453.3333333335, ans=0.125 2023-11-28 21:43:04,748 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11150, loss[loss=0.09106, simple_loss=0.1261, pruned_loss=0.01931, audio_tagging_loss=0.008706, over 16295.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08977, pruned_loss=0.01201, audio_tagging_loss=0.008801, over 3053058.53 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:43:08,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3681520.0, ans=0.125 2023-11-28 21:43:24,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3681586.6666666665, ans=0.0 2023-11-28 21:43:29,356 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552250 2023-11-28 21:43:36,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=3681653.3333333335, ans=15.0 2023-11-28 21:43:51,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3681720.0, ans=0.2 2023-11-28 21:44:06,242 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11200, loss[loss=0.03609, simple_loss=0.04713, pruned_loss=0.005754, audio_tagging_loss=0.006774, over 13810.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.0882, pruned_loss=0.01177, audio_tagging_loss=0.008915, over 3047869.94 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:44:30,336 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552300 2023-11-28 21:44:31,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3681986.6666666665, ans=0.0 2023-11-28 21:44:35,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3681986.6666666665, ans=0.125 2023-11-28 21:44:37,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3681986.6666666665, ans=0.1 2023-11-28 21:44:40,804 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.33 vs. limit=15.0 2023-11-28 21:44:41,353 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.920e+01 9.511e+01 1.032e+02 1.205e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 21:44:49,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3682053.3333333335, ans=0.1 2023-11-28 21:45:02,154 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.87 vs. limit=10.0 2023-11-28 21:45:08,017 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11250, loss[loss=0.06297, simple_loss=0.0864, pruned_loss=0.01147, audio_tagging_loss=0.008292, over 16579.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08856, pruned_loss=0.01181, audio_tagging_loss=0.0089, over 3044331.74 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:45:11,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3682186.6666666665, ans=0.125 2023-11-28 21:45:17,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3682186.6666666665, ans=0.125 2023-11-28 21:45:32,182 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552350 2023-11-28 21:45:34,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2023-11-28 21:45:36,885 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.26 vs. limit=15.0 2023-11-28 21:45:52,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3682386.6666666665, ans=0.0 2023-11-28 21:46:09,204 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11300, loss[loss=0.05261, simple_loss=0.07531, pruned_loss=0.006843, audio_tagging_loss=0.008114, over 15109.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08834, pruned_loss=0.01177, audio_tagging_loss=0.008792, over 3043822.20 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:46:20,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3682586.6666666665, ans=0.1 2023-11-28 21:46:33,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3682653.3333333335, ans=0.125 2023-11-28 21:46:34,620 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552400 2023-11-28 21:46:40,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3682653.3333333335, ans=0.1 2023-11-28 21:46:46,401 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 8.990e+01 9.658e+01 1.057e+02 1.418e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 21:46:52,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3682720.0, ans=0.0 2023-11-28 21:47:12,720 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11350, loss[loss=0.05841, simple_loss=0.07635, pruned_loss=0.01061, audio_tagging_loss=0.009627, over 13584.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08827, pruned_loss=0.01181, audio_tagging_loss=0.008702, over 3045586.85 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:47:24,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3682920.0, ans=0.1 2023-11-28 21:47:35,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3682920.0, ans=0.125 2023-11-28 21:47:37,353 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552450 2023-11-28 21:47:38,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3682986.6666666665, ans=0.125 2023-11-28 21:47:46,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.94 vs. limit=22.5 2023-11-28 21:48:14,196 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11400, loss[loss=0.06227, simple_loss=0.08405, pruned_loss=0.01164, audio_tagging_loss=0.008605, over 15071.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08804, pruned_loss=0.01184, audio_tagging_loss=0.008651, over 3037034.94 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:48:24,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3683186.6666666665, ans=0.2 2023-11-28 21:48:36,989 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.48 vs. limit=12.0 2023-11-28 21:48:38,652 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552500 2023-11-28 21:48:38,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3683320.0, ans=0.125 2023-11-28 21:48:40,327 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.02 vs. limit=10.0 2023-11-28 21:48:48,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3683320.0, ans=0.125 2023-11-28 21:48:49,679 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.359e+01 8.845e+01 9.711e+01 1.056e+02 1.187e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 21:48:51,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3683386.6666666665, ans=0.125 2023-11-28 21:49:11,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3683453.3333333335, ans=0.09899494936611666 2023-11-28 21:49:13,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3683453.3333333335, ans=0.125 2023-11-28 21:49:16,160 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11450, loss[loss=0.07486, simple_loss=0.1093, pruned_loss=0.0139, audio_tagging_loss=0.006319, over 15048.00 frames. ], tot_loss[loss=0.06401, simple_loss=0.08752, pruned_loss=0.01163, audio_tagging_loss=0.008616, over 3039347.50 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:49:29,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3683586.6666666665, ans=0.0 2023-11-28 21:49:38,482 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.70 vs. limit=15.0 2023-11-28 21:49:40,334 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552550 2023-11-28 21:50:12,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3683786.6666666665, ans=0.125 2023-11-28 21:50:17,012 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11500, loss[loss=0.04501, simple_loss=0.04997, pruned_loss=0.006963, audio_tagging_loss=0.01307, over 16258.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08832, pruned_loss=0.01176, audio_tagging_loss=0.008595, over 3044431.38 frames. ], batch size: 64, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:50:33,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3683920.0, ans=0.125 2023-11-28 21:50:40,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3683920.0, ans=0.125 2023-11-28 21:50:42,266 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552600 2023-11-28 21:50:45,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3683986.6666666665, ans=0.2 2023-11-28 21:50:53,414 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.968e+01 8.799e+01 9.432e+01 1.014e+02 1.264e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 21:51:04,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3684053.3333333335, ans=0.125 2023-11-28 21:51:18,532 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11550, loss[loss=0.09968, simple_loss=0.1445, pruned_loss=0.02298, audio_tagging_loss=0.004447, over 15829.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08844, pruned_loss=0.01177, audio_tagging_loss=0.008591, over 3043812.32 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:51:44,023 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552650 2023-11-28 21:51:57,591 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:51:57,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3684386.6666666665, ans=0.0 2023-11-28 21:52:10,279 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=12.0 2023-11-28 21:52:13,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3684453.3333333335, ans=0.125 2023-11-28 21:52:20,656 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11600, loss[loss=0.08184, simple_loss=0.1187, pruned_loss=0.01579, audio_tagging_loss=0.006719, over 15445.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08916, pruned_loss=0.01189, audio_tagging_loss=0.008572, over 3041973.24 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:52:34,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3684586.6666666665, ans=0.1 2023-11-28 21:52:45,072 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552700 2023-11-28 21:52:55,518 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 9.024e+01 9.602e+01 1.030e+02 1.712e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 21:53:03,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3684720.0, ans=0.1 2023-11-28 21:53:14,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3684786.6666666665, ans=0.125 2023-11-28 21:53:17,223 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.53 vs. limit=15.0 2023-11-28 21:53:19,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3684786.6666666665, ans=0.0 2023-11-28 21:53:19,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2023-11-28 21:53:20,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3684853.3333333335, ans=0.0 2023-11-28 21:53:21,384 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11650, loss[loss=0.07192, simple_loss=0.09587, pruned_loss=0.01261, audio_tagging_loss=0.01138, over 14638.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08944, pruned_loss=0.01198, audio_tagging_loss=0.00869, over 3042605.24 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:53:21,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3684853.3333333335, ans=0.125 2023-11-28 21:53:37,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3684920.0, ans=0.125 2023-11-28 21:53:44,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3684920.0, ans=0.0 2023-11-28 21:53:46,698 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552750 2023-11-28 21:53:59,548 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.70 vs. limit=22.5 2023-11-28 21:54:02,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3685053.3333333335, ans=0.015 2023-11-28 21:54:02,904 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-28 21:54:05,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=3685053.3333333335, ans=0.2 2023-11-28 21:54:22,840 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11700, loss[loss=0.06642, simple_loss=0.08881, pruned_loss=0.01078, audio_tagging_loss=0.01123, over 16348.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08881, pruned_loss=0.01185, audio_tagging_loss=0.008736, over 3042665.05 frames. ], batch size: 64, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:54:24,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3685186.6666666665, ans=0.125 2023-11-28 21:54:25,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3685186.6666666665, ans=0.04949747468305833 2023-11-28 21:54:41,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3685253.3333333335, ans=0.1 2023-11-28 21:54:48,048 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552800 2023-11-28 21:54:56,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3685320.0, ans=0.0 2023-11-28 21:54:59,275 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 9.206e+01 9.735e+01 1.055e+02 1.331e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-28 21:55:07,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3685386.6666666665, ans=0.125 2023-11-28 21:55:16,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3685453.3333333335, ans=0.0 2023-11-28 21:55:19,558 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.17 vs. limit=8.0 2023-11-28 21:55:22,087 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=22.5 2023-11-28 21:55:23,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.93 vs. limit=15.0 2023-11-28 21:55:24,987 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11750, loss[loss=0.06035, simple_loss=0.07774, pruned_loss=0.01074, audio_tagging_loss=0.01074, over 14488.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08899, pruned_loss=0.01196, audio_tagging_loss=0.008676, over 3044378.01 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:55:25,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3685520.0, ans=0.125 2023-11-28 21:55:42,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3685586.6666666665, ans=0.125 2023-11-28 21:55:49,566 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552850 2023-11-28 21:55:50,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3685653.3333333335, ans=15.0 2023-11-28 21:55:51,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3685653.3333333335, ans=0.125 2023-11-28 21:55:56,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3685653.3333333335, ans=0.125 2023-11-28 21:55:56,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3685653.3333333335, ans=0.125 2023-11-28 21:56:01,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3685720.0, ans=0.0 2023-11-28 21:56:01,686 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.43 vs. limit=10.0 2023-11-28 21:56:06,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3685720.0, ans=0.0 2023-11-28 21:56:10,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3685720.0, ans=0.2 2023-11-28 21:56:14,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3685786.6666666665, ans=0.125 2023-11-28 21:56:17,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3685786.6666666665, ans=0.2 2023-11-28 21:56:26,098 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11800, loss[loss=0.07975, simple_loss=0.1146, pruned_loss=0.01328, audio_tagging_loss=0.009143, over 14829.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.0895, pruned_loss=0.01202, audio_tagging_loss=0.008627, over 3043270.20 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:56:41,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3685920.0, ans=0.125 2023-11-28 21:56:51,014 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552900 2023-11-28 21:56:54,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3685986.6666666665, ans=0.2 2023-11-28 21:56:58,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3685986.6666666665, ans=0.2 2023-11-28 21:57:02,205 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.743e+01 8.813e+01 9.510e+01 1.037e+02 1.447e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 21:57:05,847 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.88 vs. limit=15.0 2023-11-28 21:57:10,325 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.57 vs. limit=22.5 2023-11-28 21:57:11,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2023-11-28 21:57:14,117 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.50 vs. limit=10.0 2023-11-28 21:57:18,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3686120.0, ans=0.0 2023-11-28 21:57:28,220 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11850, loss[loss=0.05263, simple_loss=0.07144, pruned_loss=0.006414, audio_tagging_loss=0.01049, over 14002.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09042, pruned_loss=0.01228, audio_tagging_loss=0.008731, over 3050020.84 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:57:43,615 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.27 vs. limit=15.0 2023-11-28 21:57:51,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3686253.3333333335, ans=0.125 2023-11-28 21:57:53,517 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552950 2023-11-28 21:57:58,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3686320.0, ans=0.05 2023-11-28 21:58:18,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3686453.3333333335, ans=0.125 2023-11-28 21:58:29,180 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11900, loss[loss=0.04719, simple_loss=0.064, pruned_loss=0.00627, audio_tagging_loss=0.00892, over 16095.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08976, pruned_loss=0.01231, audio_tagging_loss=0.008806, over 3045477.23 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:58:44,816 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:58:54,577 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553000 2023-11-28 21:59:03,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3686653.3333333335, ans=0.0 2023-11-28 21:59:05,471 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 8.697e+01 9.440e+01 1.029e+02 1.196e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 21:59:15,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3686720.0, ans=0.125 2023-11-28 21:59:32,225 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11950, loss[loss=0.05659, simple_loss=0.07162, pruned_loss=0.01019, audio_tagging_loss=0.01059, over 15651.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08799, pruned_loss=0.01201, audio_tagging_loss=0.009004, over 3044213.10 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:59:36,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.38 vs. limit=12.0 2023-11-28 21:59:47,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3686920.0, ans=0.0 2023-11-28 21:59:50,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3686920.0, ans=0.2 2023-11-28 21:59:56,212 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553050 2023-11-28 22:00:31,699 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 12000, loss[loss=0.07453, simple_loss=0.1072, pruned_loss=0.01318, audio_tagging_loss=0.007763, over 15105.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08926, pruned_loss=0.01226, audio_tagging_loss=0.009015, over 3051842.48 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 22:00:31,701 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 22:01:12,001 INFO [train_asr.py:1267] (0/4) Epoch 46, validation: loss=0.05835, simple_loss=0.05054, pruned_loss=0.005304, audio_tagging_loss=0.02778, over 4681554.00 frames. 2023-11-28 22:01:12,002 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 22:01:34,505 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553100 2023-11-28 22:01:39,323 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-46.pt 2023-11-28 22:01:56,281 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 0, loss[loss=0.06724, simple_loss=0.09066, pruned_loss=0.00686, audio_tagging_loss=0.01505, over 15379.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09066, pruned_loss=0.00686, audio_tagging_loss=0.01505, over 15379.00 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:01:56,286 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 22:02:32,339 INFO [train_asr.py:1267] (0/4) Epoch 47, validation: loss=0.05784, simple_loss=0.05051, pruned_loss=0.005299, audio_tagging_loss=0.02728, over 4681554.00 frames. 2023-11-28 22:02:32,340 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 22:02:33,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3687340.0, ans=0.125 2023-11-28 22:02:39,328 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 9.135e+01 9.831e+01 1.074e+02 1.367e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-28 22:02:59,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3687473.3333333335, ans=0.125 2023-11-28 22:03:07,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3687540.0, ans=0.125 2023-11-28 22:03:11,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.92 vs. limit=10.0 2023-11-28 22:03:29,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3687606.6666666665, ans=0.125 2023-11-28 22:03:30,845 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553150 2023-11-28 22:03:34,265 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 50, loss[loss=0.06837, simple_loss=0.08706, pruned_loss=0.009911, audio_tagging_loss=0.01493, over 14780.00 frames. ], tot_loss[loss=0.07103, simple_loss=0.08675, pruned_loss=0.01119, audio_tagging_loss=0.01647, over 689118.61 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:03:35,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3687673.3333333335, ans=0.0 2023-11-28 22:03:43,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3687673.3333333335, ans=0.0 2023-11-28 22:03:55,556 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.99 vs. limit=15.0 2023-11-28 22:03:59,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3687806.6666666665, ans=0.05 2023-11-28 22:04:15,210 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2023-11-28 22:04:17,352 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:04:33,355 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553200 2023-11-28 22:04:37,301 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 100, loss[loss=0.09036, simple_loss=0.1257, pruned_loss=0.0172, audio_tagging_loss=0.01031, over 15711.00 frames. ], tot_loss[loss=0.07055, simple_loss=0.0866, pruned_loss=0.01133, audio_tagging_loss=0.01592, over 1208745.64 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:04:44,875 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.649e+01 9.823e+01 1.051e+02 1.142e+02 1.295e+02, threshold=2.102e+02, percent-clipped=0.0 2023-11-28 22:04:51,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3688073.3333333335, ans=0.0 2023-11-28 22:04:56,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3688073.3333333335, ans=0.1 2023-11-28 22:05:02,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3688140.0, ans=0.015 2023-11-28 22:05:04,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3688140.0, ans=0.125 2023-11-28 22:05:15,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3688206.6666666665, ans=0.125 2023-11-28 22:05:16,517 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2023-11-28 22:05:23,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3688206.6666666665, ans=0.125 2023-11-28 22:05:24,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3688206.6666666665, ans=0.125 2023-11-28 22:05:36,211 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553250 2023-11-28 22:05:40,249 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 150, loss[loss=0.06629, simple_loss=0.08774, pruned_loss=0.01118, audio_tagging_loss=0.01124, over 14427.00 frames. ], tot_loss[loss=0.06893, simple_loss=0.08661, pruned_loss=0.01128, audio_tagging_loss=0.01435, over 1615332.33 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:05:50,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3688340.0, ans=0.1 2023-11-28 22:06:04,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3688473.3333333335, ans=0.125 2023-11-28 22:06:39,490 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553300 2023-11-28 22:06:42,853 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 200, loss[loss=0.07634, simple_loss=0.1084, pruned_loss=0.01294, audio_tagging_loss=0.00918, over 15940.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.08663, pruned_loss=0.01136, audio_tagging_loss=0.01268, over 1938973.26 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:06:47,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3688673.3333333335, ans=0.125 2023-11-28 22:06:49,631 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2023-11-28 22:06:51,844 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.129e+01 9.056e+01 9.738e+01 1.064e+02 1.248e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-28 22:06:52,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3688673.3333333335, ans=0.125 2023-11-28 22:07:01,655 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.81 vs. limit=15.0 2023-11-28 22:07:07,050 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.48 vs. limit=10.0 2023-11-28 22:07:22,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3688873.3333333335, ans=0.125 2023-11-28 22:07:29,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3688873.3333333335, ans=0.0 2023-11-28 22:07:31,518 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.06 vs. limit=6.0 2023-11-28 22:07:41,070 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553350 2023-11-28 22:07:44,534 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 250, loss[loss=0.05896, simple_loss=0.0871, pruned_loss=0.008705, audio_tagging_loss=0.006707, over 15182.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.08773, pruned_loss=0.01163, audio_tagging_loss=0.01138, over 2184082.71 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:08:00,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3689073.3333333335, ans=0.125 2023-11-28 22:08:08,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3689140.0, ans=0.125 2023-11-28 22:08:12,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3689140.0, ans=0.125 2023-11-28 22:08:19,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2023-11-28 22:08:29,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3689206.6666666665, ans=0.0 2023-11-28 22:08:36,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3689273.3333333335, ans=0.0 2023-11-28 22:08:42,604 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553400 2023-11-28 22:08:46,418 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 300, loss[loss=0.06641, simple_loss=0.09107, pruned_loss=0.01332, audio_tagging_loss=0.007554, over 16318.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.08868, pruned_loss=0.01191, audio_tagging_loss=0.01056, over 2378784.93 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:08:47,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3689340.0, ans=0.2 2023-11-28 22:08:55,133 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 9.275e+01 9.937e+01 1.062e+02 1.967e+02, threshold=1.987e+02, percent-clipped=1.0 2023-11-28 22:09:00,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3689406.6666666665, ans=0.0 2023-11-28 22:09:12,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3689473.3333333335, ans=0.125 2023-11-28 22:09:17,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3689473.3333333335, ans=0.125 2023-11-28 22:09:39,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3689606.6666666665, ans=0.125 2023-11-28 22:09:44,108 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553450 2023-11-28 22:09:48,026 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 350, loss[loss=0.07463, simple_loss=0.1053, pruned_loss=0.01207, audio_tagging_loss=0.009908, over 15591.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09028, pruned_loss=0.01208, audio_tagging_loss=0.009888, over 2529135.30 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:10:08,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3689740.0, ans=0.0 2023-11-28 22:10:23,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3689873.3333333335, ans=0.2 2023-11-28 22:10:30,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3689873.3333333335, ans=0.125 2023-11-28 22:10:30,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3689873.3333333335, ans=0.125 2023-11-28 22:10:36,817 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.43 vs. limit=15.0 2023-11-28 22:10:39,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3689940.0, ans=0.125 2023-11-28 22:10:44,912 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553500 2023-11-28 22:10:48,621 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 400, loss[loss=0.05635, simple_loss=0.07394, pruned_loss=0.009685, audio_tagging_loss=0.009696, over 14660.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.08997, pruned_loss=0.01209, audio_tagging_loss=0.009615, over 2648295.22 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:10:56,881 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 9.027e+01 9.535e+01 1.022e+02 1.341e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 22:11:04,044 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2023-11-28 22:11:04,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3690073.3333333335, ans=0.125 2023-11-28 22:11:11,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3690073.3333333335, ans=0.1 2023-11-28 22:11:11,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3690073.3333333335, ans=0.125 2023-11-28 22:11:11,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3690073.3333333335, ans=0.125 2023-11-28 22:11:17,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3690140.0, ans=0.125 2023-11-28 22:11:29,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3690206.6666666665, ans=0.0 2023-11-28 22:11:41,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3690273.3333333335, ans=0.125 2023-11-28 22:11:42,082 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.12 vs. limit=15.0 2023-11-28 22:11:44,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3690273.3333333335, ans=0.2 2023-11-28 22:11:47,882 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553550 2023-11-28 22:11:51,251 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 450, loss[loss=0.05336, simple_loss=0.07132, pruned_loss=0.00733, audio_tagging_loss=0.01038, over 13915.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09029, pruned_loss=0.0122, audio_tagging_loss=0.009338, over 2737602.97 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:11:53,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.68 vs. limit=15.0 2023-11-28 22:12:01,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3690340.0, ans=0.0 2023-11-28 22:12:05,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.76 vs. limit=15.0 2023-11-28 22:12:10,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3690406.6666666665, ans=15.0 2023-11-28 22:12:38,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3690540.0, ans=0.125 2023-11-28 22:12:38,542 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=15.0 2023-11-28 22:12:49,006 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553600 2023-11-28 22:12:52,925 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 500, loss[loss=0.06829, simple_loss=0.08743, pruned_loss=0.01509, audio_tagging_loss=0.009493, over 15450.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09085, pruned_loss=0.0124, audio_tagging_loss=0.00907, over 2810313.19 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:13:01,821 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.752e+01 8.926e+01 9.624e+01 1.054e+02 1.218e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 22:13:19,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.88 vs. limit=15.0 2023-11-28 22:13:35,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3690873.3333333335, ans=0.0 2023-11-28 22:13:38,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.35 vs. limit=10.0 2023-11-28 22:13:51,647 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553650 2023-11-28 22:13:54,975 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.97 vs. limit=12.0 2023-11-28 22:13:55,648 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 550, loss[loss=0.07835, simple_loss=0.1106, pruned_loss=0.01557, audio_tagging_loss=0.00746, over 15324.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09137, pruned_loss=0.01243, audio_tagging_loss=0.008876, over 2864730.37 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:14:06,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3691073.3333333335, ans=0.0 2023-11-28 22:14:36,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3691206.6666666665, ans=0.2 2023-11-28 22:14:40,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3691206.6666666665, ans=0.125 2023-11-28 22:14:53,467 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553700 2023-11-28 22:14:57,473 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 600, loss[loss=0.08474, simple_loss=0.1109, pruned_loss=0.02044, audio_tagging_loss=0.008848, over 15696.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09054, pruned_loss=0.0122, audio_tagging_loss=0.008789, over 2907935.92 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:15:06,288 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.361e+01 8.960e+01 9.634e+01 1.013e+02 1.210e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 22:15:16,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3691406.6666666665, ans=0.0 2023-11-28 22:15:24,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.47 vs. limit=15.0 2023-11-28 22:15:27,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3691473.3333333335, ans=0.125 2023-11-28 22:15:40,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3691540.0, ans=0.0 2023-11-28 22:15:40,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3691540.0, ans=0.125 2023-11-28 22:15:55,453 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553750 2023-11-28 22:15:58,960 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 650, loss[loss=0.04212, simple_loss=0.05035, pruned_loss=0.007032, audio_tagging_loss=0.009911, over 14965.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09019, pruned_loss=0.01211, audio_tagging_loss=0.008672, over 2946351.02 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:16:01,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3691673.3333333335, ans=0.125 2023-11-28 22:16:02,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3691673.3333333335, ans=0.125 2023-11-28 22:16:08,885 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.61 vs. limit=10.0 2023-11-28 22:16:45,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3691873.3333333335, ans=0.0 2023-11-28 22:16:50,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3691940.0, ans=0.0 2023-11-28 22:16:56,086 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553800 2023-11-28 22:16:56,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3691940.0, ans=0.125 2023-11-28 22:17:00,545 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 700, loss[loss=0.06119, simple_loss=0.08393, pruned_loss=0.01293, audio_tagging_loss=0.006296, over 16194.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09042, pruned_loss=0.01208, audio_tagging_loss=0.008638, over 2971880.80 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:17:02,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3692006.6666666665, ans=0.125 2023-11-28 22:17:09,310 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.371e+01 8.938e+01 9.507e+01 1.029e+02 1.273e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 22:17:17,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3692073.3333333335, ans=0.125 2023-11-28 22:17:21,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=3692073.3333333335, ans=8.0 2023-11-28 22:17:58,294 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553850 2023-11-28 22:17:58,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3692273.3333333335, ans=0.125 2023-11-28 22:18:02,275 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 750, loss[loss=0.05646, simple_loss=0.08097, pruned_loss=0.006038, audio_tagging_loss=0.009934, over 14736.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09054, pruned_loss=0.01206, audio_tagging_loss=0.008688, over 2995476.97 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:18:03,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3692340.0, ans=0.125 2023-11-28 22:18:05,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3692340.0, ans=0.015 2023-11-28 22:18:51,834 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=22.5 2023-11-28 22:18:52,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3692606.6666666665, ans=0.125 2023-11-28 22:18:57,980 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2023-11-28 22:19:00,713 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553900 2023-11-28 22:19:04,226 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 800, loss[loss=0.07107, simple_loss=0.1082, pruned_loss=0.01018, audio_tagging_loss=0.006792, over 15884.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.0909, pruned_loss=0.01215, audio_tagging_loss=0.008711, over 3018506.96 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:19:12,506 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.995e+01 9.559e+01 1.026e+02 1.353e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 22:19:26,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3692740.0, ans=0.125 2023-11-28 22:19:40,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3692873.3333333335, ans=0.2 2023-11-28 22:19:40,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.11 vs. limit=22.5 2023-11-28 22:19:44,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3692873.3333333335, ans=0.125 2023-11-28 22:20:02,112 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553950 2023-11-28 22:20:03,531 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:20:05,586 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 850, loss[loss=0.07806, simple_loss=0.1118, pruned_loss=0.01295, audio_tagging_loss=0.009209, over 15318.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09037, pruned_loss=0.01207, audio_tagging_loss=0.008863, over 3023480.65 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:20:07,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3693006.6666666665, ans=0.1 2023-11-28 22:20:07,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3693006.6666666665, ans=0.0 2023-11-28 22:20:23,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3693073.3333333335, ans=10.0 2023-11-28 22:20:32,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3693140.0, ans=10.0 2023-11-28 22:20:42,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3693206.6666666665, ans=0.125 2023-11-28 22:21:03,734 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554000 2023-11-28 22:21:07,979 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 900, loss[loss=0.07216, simple_loss=0.104, pruned_loss=0.01328, audio_tagging_loss=0.006875, over 15102.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08977, pruned_loss=0.01197, audio_tagging_loss=0.008859, over 3029504.55 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:21:16,634 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.970e+01 9.672e+01 1.016e+02 1.262e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 22:21:18,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3693340.0, ans=10.0 2023-11-28 22:21:23,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2023-11-28 22:21:35,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3693473.3333333335, ans=0.035 2023-11-28 22:21:43,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3693540.0, ans=0.1 2023-11-28 22:21:49,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3693540.0, ans=0.125 2023-11-28 22:21:51,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3693540.0, ans=0.125 2023-11-28 22:21:59,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3693606.6666666665, ans=0.125 2023-11-28 22:22:06,220 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554050 2023-11-28 22:22:10,250 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 950, loss[loss=0.08303, simple_loss=0.1233, pruned_loss=0.01747, audio_tagging_loss=0.003934, over 15710.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08988, pruned_loss=0.01202, audio_tagging_loss=0.008773, over 3036113.28 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:22:40,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3693806.6666666665, ans=0.1 2023-11-28 22:22:48,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3693873.3333333335, ans=0.125 2023-11-28 22:23:07,960 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554100 2023-11-28 22:23:11,528 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1000, loss[loss=0.06329, simple_loss=0.08867, pruned_loss=0.009687, audio_tagging_loss=0.009265, over 15610.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08983, pruned_loss=0.01201, audio_tagging_loss=0.008645, over 3042001.44 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:23:20,031 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=15.0 2023-11-28 22:23:20,409 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 9.063e+01 9.775e+01 1.049e+02 2.458e+02, threshold=1.955e+02, percent-clipped=1.0 2023-11-28 22:23:22,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3694073.3333333335, ans=0.125 2023-11-28 22:23:28,245 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.96 vs. limit=10.0 2023-11-28 22:23:39,348 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 22:23:50,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3694206.6666666665, ans=0.05 2023-11-28 22:23:55,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=3694206.6666666665, ans=0.2 2023-11-28 22:24:09,519 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.03 vs. limit=15.0 2023-11-28 22:24:09,870 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554150 2023-11-28 22:24:12,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3694340.0, ans=0.0 2023-11-28 22:24:13,300 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1050, loss[loss=0.09597, simple_loss=0.1362, pruned_loss=0.02102, audio_tagging_loss=0.006841, over 15837.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08958, pruned_loss=0.01207, audio_tagging_loss=0.008526, over 3041134.02 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:24:50,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3694540.0, ans=0.125 2023-11-28 22:25:10,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3694606.6666666665, ans=0.125 2023-11-28 22:25:11,912 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554200 2023-11-28 22:25:15,618 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1100, loss[loss=0.06256, simple_loss=0.08823, pruned_loss=0.01023, audio_tagging_loss=0.008222, over 15140.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08992, pruned_loss=0.01204, audio_tagging_loss=0.008481, over 3040921.77 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:25:15,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3694673.3333333335, ans=0.0 2023-11-28 22:25:19,641 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 22:25:24,304 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.729e+01 9.004e+01 9.578e+01 1.033e+02 1.285e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-28 22:25:33,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3694740.0, ans=0.0 2023-11-28 22:25:37,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3694740.0, ans=0.1 2023-11-28 22:25:41,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3694806.6666666665, ans=0.1 2023-11-28 22:25:42,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3694806.6666666665, ans=0.0 2023-11-28 22:25:51,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.47 vs. limit=15.0 2023-11-28 22:25:57,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3694873.3333333335, ans=0.0 2023-11-28 22:25:57,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3694873.3333333335, ans=0.125 2023-11-28 22:26:05,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3694940.0, ans=0.125 2023-11-28 22:26:13,900 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554250 2023-11-28 22:26:16,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3695006.6666666665, ans=0.1 2023-11-28 22:26:17,357 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1150, loss[loss=0.08455, simple_loss=0.1249, pruned_loss=0.01701, audio_tagging_loss=0.005117, over 16201.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.09006, pruned_loss=0.01209, audio_tagging_loss=0.008419, over 3043492.19 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:26:19,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3695006.6666666665, ans=0.125 2023-11-28 22:26:44,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3695140.0, ans=0.125 2023-11-28 22:26:48,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3695140.0, ans=0.1 2023-11-28 22:26:52,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3695140.0, ans=0.1 2023-11-28 22:26:54,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3695206.6666666665, ans=0.125 2023-11-28 22:27:13,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3695273.3333333335, ans=0.125 2023-11-28 22:27:14,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3695273.3333333335, ans=0.125 2023-11-28 22:27:15,952 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554300 2023-11-28 22:27:19,252 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1200, loss[loss=0.06675, simple_loss=0.08853, pruned_loss=0.0132, audio_tagging_loss=0.009276, over 15471.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09043, pruned_loss=0.01221, audio_tagging_loss=0.00845, over 3042909.62 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:27:27,993 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.320e+01 8.745e+01 9.451e+01 1.036e+02 1.471e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 22:27:52,636 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:28:06,569 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2023-11-28 22:28:17,002 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554350 2023-11-28 22:28:20,975 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1250, loss[loss=0.05828, simple_loss=0.08354, pruned_loss=0.01021, audio_tagging_loss=0.006306, over 16143.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08915, pruned_loss=0.01193, audio_tagging_loss=0.00846, over 3045599.81 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:28:23,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3695673.3333333335, ans=0.2 2023-11-28 22:28:28,933 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.07 vs. limit=12.0 2023-11-28 22:28:31,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3695673.3333333335, ans=0.0 2023-11-28 22:28:33,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3695740.0, ans=0.125 2023-11-28 22:28:40,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3695740.0, ans=0.125 2023-11-28 22:28:48,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3695806.6666666665, ans=0.0 2023-11-28 22:28:49,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3695806.6666666665, ans=0.1 2023-11-28 22:29:08,589 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.60 vs. limit=6.0 2023-11-28 22:29:14,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3695940.0, ans=0.0 2023-11-28 22:29:18,998 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554400 2023-11-28 22:29:22,773 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1300, loss[loss=0.06019, simple_loss=0.08725, pruned_loss=0.007712, audio_tagging_loss=0.008852, over 16408.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08935, pruned_loss=0.01188, audio_tagging_loss=0.008444, over 3045199.98 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:29:26,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3696006.6666666665, ans=0.0 2023-11-28 22:29:30,744 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 9.038e+01 9.627e+01 1.019e+02 1.676e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 22:29:57,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3696140.0, ans=0.125 2023-11-28 22:30:07,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3696206.6666666665, ans=0.125 2023-11-28 22:30:21,094 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554450 2023-11-28 22:30:24,572 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1350, loss[loss=0.04888, simple_loss=0.06016, pruned_loss=0.008541, audio_tagging_loss=0.01026, over 14777.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08893, pruned_loss=0.01185, audio_tagging_loss=0.008445, over 3042340.45 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:30:33,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3696340.0, ans=0.125 2023-11-28 22:31:08,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3696540.0, ans=0.125 2023-11-28 22:31:10,379 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 22:31:22,874 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554500 2023-11-28 22:31:26,148 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1400, loss[loss=0.06323, simple_loss=0.09749, pruned_loss=0.007911, audio_tagging_loss=0.006576, over 14933.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08923, pruned_loss=0.01189, audio_tagging_loss=0.008616, over 3046051.93 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:31:31,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3696673.3333333335, ans=0.0 2023-11-28 22:31:32,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3696673.3333333335, ans=0.2 2023-11-28 22:31:35,452 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.987e+01 9.786e+01 1.046e+02 1.300e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-28 22:31:35,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3696673.3333333335, ans=0.2 2023-11-28 22:31:35,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3696673.3333333335, ans=0.125 2023-11-28 22:31:36,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3696673.3333333335, ans=10.0 2023-11-28 22:31:57,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3696806.6666666665, ans=0.125 2023-11-28 22:32:07,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3696873.3333333335, ans=0.0 2023-11-28 22:32:09,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3696873.3333333335, ans=0.0 2023-11-28 22:32:24,798 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554550 2023-11-28 22:32:27,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3697006.6666666665, ans=0.2 2023-11-28 22:32:28,225 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1450, loss[loss=0.05211, simple_loss=0.06945, pruned_loss=0.007734, audio_tagging_loss=0.009646, over 14724.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08824, pruned_loss=0.01166, audio_tagging_loss=0.008654, over 3047778.97 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 22:32:37,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3697006.6666666665, ans=0.125 2023-11-28 22:32:57,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3697140.0, ans=0.0 2023-11-28 22:33:25,738 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554600 2023-11-28 22:33:29,719 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1500, loss[loss=0.06695, simple_loss=0.08563, pruned_loss=0.01395, audio_tagging_loss=0.01019, over 15584.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08949, pruned_loss=0.01193, audio_tagging_loss=0.008682, over 3047956.38 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 22:33:35,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3697340.0, ans=0.125 2023-11-28 22:33:38,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=3697340.0, ans=0.02 2023-11-28 22:33:40,206 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.171e+01 8.916e+01 9.599e+01 1.025e+02 1.569e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 22:33:50,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3697406.6666666665, ans=0.025 2023-11-28 22:34:18,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3697606.6666666665, ans=0.025 2023-11-28 22:34:27,850 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554650 2023-11-28 22:34:29,642 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.50 vs. limit=15.0 2023-11-28 22:34:31,287 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1550, loss[loss=0.05967, simple_loss=0.082, pruned_loss=0.01078, audio_tagging_loss=0.007886, over 14598.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08983, pruned_loss=0.01194, audio_tagging_loss=0.008732, over 3044347.15 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 22:34:36,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3697673.3333333335, ans=0.2 2023-11-28 22:34:38,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3697673.3333333335, ans=0.125 2023-11-28 22:34:40,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3697673.3333333335, ans=0.1 2023-11-28 22:35:09,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3697873.3333333335, ans=0.125 2023-11-28 22:35:25,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3697940.0, ans=0.0 2023-11-28 22:35:29,320 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554700 2023-11-28 22:35:32,755 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1600, loss[loss=0.05626, simple_loss=0.07233, pruned_loss=0.01221, audio_tagging_loss=0.007877, over 15055.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08928, pruned_loss=0.01187, audio_tagging_loss=0.008859, over 3046170.38 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:35:39,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3698006.6666666665, ans=0.0 2023-11-28 22:35:41,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3698006.6666666665, ans=0.035 2023-11-28 22:35:44,022 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 8.984e+01 9.580e+01 1.035e+02 1.494e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-28 22:35:51,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3698073.3333333335, ans=0.1 2023-11-28 22:35:51,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3698073.3333333335, ans=0.2 2023-11-28 22:35:55,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3698073.3333333335, ans=0.0 2023-11-28 22:35:56,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3698140.0, ans=0.1 2023-11-28 22:36:02,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3698140.0, ans=10.0 2023-11-28 22:36:24,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3698273.3333333335, ans=0.1 2023-11-28 22:36:31,681 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554750 2023-11-28 22:36:31,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3698273.3333333335, ans=0.125 2023-11-28 22:36:34,265 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:36:35,101 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1650, loss[loss=0.06698, simple_loss=0.09062, pruned_loss=0.01311, audio_tagging_loss=0.008561, over 14949.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.0892, pruned_loss=0.01201, audio_tagging_loss=0.008763, over 3045475.08 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:36:42,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3698340.0, ans=0.0 2023-11-28 22:36:52,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3698406.6666666665, ans=0.1 2023-11-28 22:36:54,387 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.32 vs. limit=15.0 2023-11-28 22:36:58,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3698473.3333333335, ans=0.2 2023-11-28 22:36:58,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3698473.3333333335, ans=0.0 2023-11-28 22:37:01,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2023-11-28 22:37:03,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3698473.3333333335, ans=0.0 2023-11-28 22:37:09,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3698473.3333333335, ans=0.2 2023-11-28 22:37:32,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3698606.6666666665, ans=0.0 2023-11-28 22:37:33,149 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554800 2023-11-28 22:37:37,081 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1700, loss[loss=0.07221, simple_loss=0.1019, pruned_loss=0.01303, audio_tagging_loss=0.008225, over 16458.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.0893, pruned_loss=0.01212, audio_tagging_loss=0.008812, over 3046103.54 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:37:47,559 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 8.852e+01 9.479e+01 1.004e+02 1.252e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 22:37:49,301 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.13 vs. limit=10.0 2023-11-28 22:37:52,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3698740.0, ans=0.125 2023-11-28 22:38:16,126 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=15.0 2023-11-28 22:38:34,924 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554850 2023-11-28 22:38:38,835 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1750, loss[loss=0.05755, simple_loss=0.07658, pruned_loss=0.01073, audio_tagging_loss=0.008534, over 15952.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08806, pruned_loss=0.01194, audio_tagging_loss=0.008802, over 3055020.94 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:38:51,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3699073.3333333335, ans=0.0 2023-11-28 22:39:05,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3699140.0, ans=0.125 2023-11-28 22:39:20,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3699206.6666666665, ans=0.125 2023-11-28 22:39:35,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3699273.3333333335, ans=0.125 2023-11-28 22:39:36,328 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554900 2023-11-28 22:39:40,439 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1800, loss[loss=0.08742, simple_loss=0.1288, pruned_loss=0.01635, audio_tagging_loss=0.006653, over 15360.00 frames. ], tot_loss[loss=0.06431, simple_loss=0.08751, pruned_loss=0.01181, audio_tagging_loss=0.00874, over 3052217.00 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:39:44,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3699340.0, ans=0.125 2023-11-28 22:39:51,493 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.305e+01 9.095e+01 9.843e+01 1.068e+02 2.957e+02, threshold=1.969e+02, percent-clipped=2.0 2023-11-28 22:39:57,463 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=15.0 2023-11-28 22:39:58,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3699406.6666666665, ans=0.125 2023-11-28 22:40:06,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3699473.3333333335, ans=0.0 2023-11-28 22:40:29,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3699606.6666666665, ans=0.125 2023-11-28 22:40:38,715 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554950 2023-11-28 22:40:42,253 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1850, loss[loss=0.05787, simple_loss=0.08305, pruned_loss=0.009083, audio_tagging_loss=0.007261, over 15424.00 frames. ], tot_loss[loss=0.06423, simple_loss=0.08766, pruned_loss=0.01181, audio_tagging_loss=0.008596, over 3050840.15 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:40:54,351 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2023-11-28 22:41:16,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3699806.6666666665, ans=0.125 2023-11-28 22:41:16,221 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:41:23,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3699873.3333333335, ans=10.0 2023-11-28 22:41:31,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3699940.0, ans=0.07 2023-11-28 22:41:37,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3699940.0, ans=0.125 2023-11-28 22:41:40,474 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555000 2023-11-28 22:41:44,278 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1900, loss[loss=0.07682, simple_loss=0.1042, pruned_loss=0.01627, audio_tagging_loss=0.008423, over 15531.00 frames. ], tot_loss[loss=0.06409, simple_loss=0.08742, pruned_loss=0.01185, audio_tagging_loss=0.008525, over 3051824.92 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:41:49,031 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.27 vs. limit=15.0 2023-11-28 22:41:49,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3700006.6666666665, ans=0.2 2023-11-28 22:41:55,306 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 8.917e+01 9.676e+01 1.038e+02 1.630e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 22:42:37,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3700273.3333333335, ans=0.0 2023-11-28 22:42:41,999 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555050 2023-11-28 22:42:44,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3700340.0, ans=0.07 2023-11-28 22:42:45,399 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1950, loss[loss=0.04652, simple_loss=0.06099, pruned_loss=0.00381, audio_tagging_loss=0.01221, over 15989.00 frames. ], tot_loss[loss=0.06364, simple_loss=0.08682, pruned_loss=0.01169, audio_tagging_loss=0.008545, over 3054743.64 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:42:48,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3700340.0, ans=0.5 2023-11-28 22:42:57,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3700406.6666666665, ans=0.0 2023-11-28 22:43:09,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3700473.3333333335, ans=0.2 2023-11-28 22:43:14,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3700473.3333333335, ans=0.125 2023-11-28 22:43:19,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3700473.3333333335, ans=0.0 2023-11-28 22:43:39,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3700606.6666666665, ans=0.0 2023-11-28 22:43:43,628 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555100 2023-11-28 22:43:43,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3700606.6666666665, ans=0.125 2023-11-28 22:43:46,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3700673.3333333335, ans=0.125 2023-11-28 22:43:46,978 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2000, loss[loss=0.06943, simple_loss=0.08821, pruned_loss=0.01694, audio_tagging_loss=0.008389, over 16049.00 frames. ], tot_loss[loss=0.06389, simple_loss=0.0872, pruned_loss=0.01173, audio_tagging_loss=0.008561, over 3046380.91 frames. ], batch size: 63, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:43:48,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3700673.3333333335, ans=0.07 2023-11-28 22:43:58,089 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.153e+01 8.916e+01 9.601e+01 1.024e+02 1.438e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 22:44:08,271 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:44:09,917 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.71 vs. limit=22.5 2023-11-28 22:44:45,216 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555150 2023-11-28 22:44:48,544 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2050, loss[loss=0.06264, simple_loss=0.08409, pruned_loss=0.01023, audio_tagging_loss=0.01037, over 13219.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.0879, pruned_loss=0.01189, audio_tagging_loss=0.008541, over 3042640.23 frames. ], batch size: 52, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:44:58,831 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2023-11-28 22:45:06,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3701073.3333333335, ans=0.1 2023-11-28 22:45:29,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3701206.6666666665, ans=0.2 2023-11-28 22:45:46,415 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555200 2023-11-28 22:45:50,156 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2100, loss[loss=0.06903, simple_loss=0.09168, pruned_loss=0.01458, audio_tagging_loss=0.008613, over 15028.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.0883, pruned_loss=0.01197, audio_tagging_loss=0.008546, over 3046780.80 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:45:52,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3701340.0, ans=0.0 2023-11-28 22:46:02,567 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 8.958e+01 9.568e+01 1.025e+02 1.229e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 22:46:48,001 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555250 2023-11-28 22:46:52,253 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2150, loss[loss=0.08315, simple_loss=0.1177, pruned_loss=0.01583, audio_tagging_loss=0.008475, over 15716.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08884, pruned_loss=0.01204, audio_tagging_loss=0.008584, over 3053760.96 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:47:00,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3701673.3333333335, ans=0.0 2023-11-28 22:47:30,136 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 22:47:34,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3701873.3333333335, ans=0.1 2023-11-28 22:47:38,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3701873.3333333335, ans=0.0 2023-11-28 22:47:50,706 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555300 2023-11-28 22:47:54,166 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2200, loss[loss=0.06124, simple_loss=0.07792, pruned_loss=0.01255, audio_tagging_loss=0.009726, over 15818.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.09002, pruned_loss=0.01218, audio_tagging_loss=0.008462, over 3059038.75 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:47:56,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3702006.6666666665, ans=0.2 2023-11-28 22:48:06,445 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.601e+01 9.046e+01 9.585e+01 1.059e+02 1.446e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 22:48:12,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3702073.3333333335, ans=0.0 2023-11-28 22:48:13,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3702073.3333333335, ans=0.0 2023-11-28 22:48:19,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3702140.0, ans=0.125 2023-11-28 22:48:22,123 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.56 vs. limit=15.0 2023-11-28 22:48:47,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3702273.3333333335, ans=0.0 2023-11-28 22:48:52,159 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555350 2023-11-28 22:48:55,607 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2250, loss[loss=0.06535, simple_loss=0.0958, pruned_loss=0.007682, audio_tagging_loss=0.00977, over 14412.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08991, pruned_loss=0.01212, audio_tagging_loss=0.008535, over 3050473.63 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:48:56,336 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.57 vs. limit=15.0 2023-11-28 22:49:05,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3702340.0, ans=0.0 2023-11-28 22:49:24,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3702473.3333333335, ans=0.1 2023-11-28 22:49:36,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3702540.0, ans=0.125 2023-11-28 22:49:43,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3702606.6666666665, ans=0.125 2023-11-28 22:49:52,822 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555400 2023-11-28 22:49:56,840 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2300, loss[loss=0.08107, simple_loss=0.1131, pruned_loss=0.01466, audio_tagging_loss=0.009863, over 15178.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.09002, pruned_loss=0.01209, audio_tagging_loss=0.008619, over 3050602.64 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:50:09,260 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.895e+01 9.474e+01 1.045e+02 1.271e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 22:50:30,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3702806.6666666665, ans=0.125 2023-11-28 22:50:32,575 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:50:51,516 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 22:50:51,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3702940.0, ans=0.0 2023-11-28 22:50:54,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3702940.0, ans=0.0 2023-11-28 22:50:54,602 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.56 vs. limit=22.5 2023-11-28 22:50:55,245 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555450 2023-11-28 22:50:58,635 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2350, loss[loss=0.06218, simple_loss=0.08446, pruned_loss=0.01002, audio_tagging_loss=0.009924, over 15207.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08962, pruned_loss=0.01211, audio_tagging_loss=0.008702, over 3051602.37 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:51:33,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.72 vs. limit=15.0 2023-11-28 22:51:35,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3703206.6666666665, ans=0.125 2023-11-28 22:51:56,488 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555500 2023-11-28 22:51:59,838 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2400, loss[loss=0.07315, simple_loss=0.09759, pruned_loss=0.01557, audio_tagging_loss=0.008785, over 15722.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09019, pruned_loss=0.01228, audio_tagging_loss=0.008763, over 3041721.33 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:52:11,684 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.887e+01 9.633e+01 1.018e+02 1.587e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 22:52:25,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3703473.3333333335, ans=0.1 2023-11-28 22:52:29,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3703473.3333333335, ans=0.125 2023-11-28 22:52:32,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.31 vs. limit=22.5 2023-11-28 22:52:57,501 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555550 2023-11-28 22:53:01,552 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2450, loss[loss=0.08883, simple_loss=0.1238, pruned_loss=0.02209, audio_tagging_loss=0.004839, over 16411.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08984, pruned_loss=0.01214, audio_tagging_loss=0.008868, over 3040427.38 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:53:03,188 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.87 vs. limit=15.0 2023-11-28 22:53:09,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3703673.3333333335, ans=0.0 2023-11-28 22:53:10,366 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.44 vs. limit=10.0 2023-11-28 22:53:12,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3703740.0, ans=0.125 2023-11-28 22:53:13,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3703740.0, ans=0.0 2023-11-28 22:53:16,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3703740.0, ans=0.125 2023-11-28 22:53:40,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3703873.3333333335, ans=0.125 2023-11-28 22:53:42,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3703873.3333333335, ans=0.125 2023-11-28 22:53:52,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3703940.0, ans=0.125 2023-11-28 22:53:59,858 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555600 2023-11-28 22:54:04,196 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2500, loss[loss=0.0848, simple_loss=0.1228, pruned_loss=0.01775, audio_tagging_loss=0.005643, over 16460.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09002, pruned_loss=0.01211, audio_tagging_loss=0.008873, over 3040396.23 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:54:16,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3704073.3333333335, ans=0.1 2023-11-28 22:54:17,247 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.711e+01 9.036e+01 9.436e+01 1.021e+02 1.491e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 22:54:18,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.05 vs. limit=15.0 2023-11-28 22:54:27,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3704140.0, ans=0.125 2023-11-28 22:54:30,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3704140.0, ans=0.125 2023-11-28 22:54:32,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3704140.0, ans=0.0 2023-11-28 22:54:33,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3704140.0, ans=0.0 2023-11-28 22:54:34,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=3704140.0, ans=0.2 2023-11-28 22:54:36,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3704140.0, ans=0.0 2023-11-28 22:54:49,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3704206.6666666665, ans=0.0 2023-11-28 22:54:53,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3704273.3333333335, ans=0.0 2023-11-28 22:54:55,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3704273.3333333335, ans=0.1 2023-11-28 22:54:58,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3704273.3333333335, ans=0.0 2023-11-28 22:55:03,022 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555650 2023-11-28 22:55:06,525 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2550, loss[loss=0.07267, simple_loss=0.1059, pruned_loss=0.01342, audio_tagging_loss=0.006304, over 15345.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08956, pruned_loss=0.01219, audio_tagging_loss=0.008829, over 3045481.70 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:55:06,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3704340.0, ans=0.125 2023-11-28 22:55:08,388 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.88 vs. limit=15.0 2023-11-28 22:55:29,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3704473.3333333335, ans=0.0 2023-11-28 22:56:04,089 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555700 2023-11-28 22:56:07,451 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2600, loss[loss=0.05321, simple_loss=0.07175, pruned_loss=0.008132, audio_tagging_loss=0.0092, over 15693.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08849, pruned_loss=0.01194, audio_tagging_loss=0.008589, over 3045069.79 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:56:20,807 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.191e+01 8.832e+01 9.497e+01 1.024e+02 1.176e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 22:56:28,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3704740.0, ans=0.2 2023-11-28 22:56:40,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3704806.6666666665, ans=0.125 2023-11-28 22:56:41,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3704806.6666666665, ans=0.1 2023-11-28 22:57:05,654 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555750 2023-11-28 22:57:09,024 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2650, loss[loss=0.04377, simple_loss=0.05528, pruned_loss=0.00557, audio_tagging_loss=0.01056, over 16973.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08872, pruned_loss=0.01196, audio_tagging_loss=0.008573, over 3046899.06 frames. ], batch size: 63, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:57:25,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3705073.3333333335, ans=0.125 2023-11-28 22:57:34,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3705140.0, ans=0.025 2023-11-28 22:57:37,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3705140.0, ans=0.1 2023-11-28 22:57:41,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3705140.0, ans=0.0 2023-11-28 22:57:52,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3705206.6666666665, ans=0.1 2023-11-28 22:58:07,094 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555800 2023-11-28 22:58:11,703 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2700, loss[loss=0.05395, simple_loss=0.06727, pruned_loss=0.01023, audio_tagging_loss=0.01009, over 14921.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08814, pruned_loss=0.01192, audio_tagging_loss=0.00852, over 3043132.36 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:58:24,444 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.962e+01 9.013e+01 9.562e+01 1.012e+02 1.188e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 22:58:35,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3705473.3333333335, ans=0.0 2023-11-28 22:58:38,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3705473.3333333335, ans=0.125 2023-11-28 22:59:03,854 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.26 vs. limit=15.0 2023-11-28 22:59:05,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3705606.6666666665, ans=0.125 2023-11-28 22:59:09,290 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555850 2023-11-28 22:59:12,651 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2750, loss[loss=0.0642, simple_loss=0.08955, pruned_loss=0.008408, audio_tagging_loss=0.01101, over 13835.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08807, pruned_loss=0.01199, audio_tagging_loss=0.008538, over 3038171.82 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:59:15,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3705673.3333333335, ans=0.125 2023-11-28 22:59:27,565 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-28 22:59:34,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2023-11-28 22:59:48,904 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2023-11-28 22:59:49,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3705873.3333333335, ans=0.1 2023-11-28 22:59:54,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3705873.3333333335, ans=0.125 2023-11-28 23:00:07,646 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:00:10,027 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555900 2023-11-28 23:00:13,458 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2800, loss[loss=0.06066, simple_loss=0.08917, pruned_loss=0.009544, audio_tagging_loss=0.006538, over 16121.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08833, pruned_loss=0.01197, audio_tagging_loss=0.008488, over 3033659.99 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:00:26,773 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=15.0 2023-11-28 23:00:27,369 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.973e+01 9.470e+01 1.013e+02 1.282e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 23:00:35,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3706073.3333333335, ans=0.125 2023-11-28 23:00:59,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3706206.6666666665, ans=0.125 2023-11-28 23:01:12,340 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555950 2023-11-28 23:01:16,236 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2850, loss[loss=0.05582, simple_loss=0.0782, pruned_loss=0.008323, audio_tagging_loss=0.008396, over 14815.00 frames. ], tot_loss[loss=0.06417, simple_loss=0.08767, pruned_loss=0.01186, audio_tagging_loss=0.008471, over 3035532.94 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:01:19,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3706340.0, ans=0.1 2023-11-28 23:01:25,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3706340.0, ans=0.125 2023-11-28 23:01:34,380 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.58 vs. limit=15.0 2023-11-28 23:01:42,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3706473.3333333335, ans=0.125 2023-11-28 23:02:00,101 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=22.5 2023-11-28 23:02:15,024 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556000 2023-11-28 23:02:16,461 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-556000.pt 2023-11-28 23:02:21,371 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2900, loss[loss=0.06471, simple_loss=0.09425, pruned_loss=0.009726, audio_tagging_loss=0.007857, over 15452.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08807, pruned_loss=0.01185, audio_tagging_loss=0.00854, over 3039559.89 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:02:26,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3706673.3333333335, ans=0.0 2023-11-28 23:02:34,745 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.253e+01 8.955e+01 9.573e+01 1.059e+02 1.416e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 23:02:35,473 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.43 vs. limit=15.0 2023-11-28 23:02:49,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3706806.6666666665, ans=0.125 2023-11-28 23:03:06,457 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.90 vs. limit=15.0 2023-11-28 23:03:09,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3706940.0, ans=0.0 2023-11-28 23:03:10,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3706940.0, ans=0.125 2023-11-28 23:03:19,306 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556050 2023-11-28 23:03:22,904 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2950, loss[loss=0.07611, simple_loss=0.09885, pruned_loss=0.01639, audio_tagging_loss=0.01029, over 15736.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08863, pruned_loss=0.01191, audio_tagging_loss=0.008536, over 3042309.71 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:03:43,403 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2023-11-28 23:03:45,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3707073.3333333335, ans=0.0 2023-11-28 23:04:05,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3707206.6666666665, ans=0.125 2023-11-28 23:04:19,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3707273.3333333335, ans=0.0 2023-11-28 23:04:21,486 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556100 2023-11-28 23:04:24,921 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3000, loss[loss=0.05659, simple_loss=0.07592, pruned_loss=0.009668, audio_tagging_loss=0.008962, over 15365.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08836, pruned_loss=0.01198, audio_tagging_loss=0.008653, over 3042848.46 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:04:24,924 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-28 23:04:41,685 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.9876, 3.2966, 3.4878, 2.8590, 3.7385, 3.7872, 3.8464, 3.7053], device='cuda:0') 2023-11-28 23:05:04,350 INFO [train_asr.py:1267] (0/4) Epoch 47, validation: loss=0.05749, simple_loss=0.05049, pruned_loss=0.005328, audio_tagging_loss=0.02692, over 4681554.00 frames. 2023-11-28 23:05:04,351 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-28 23:05:07,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3707340.0, ans=0.1 2023-11-28 23:05:09,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3707340.0, ans=0.125 2023-11-28 23:05:17,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3707406.6666666665, ans=22.5 2023-11-28 23:05:20,049 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.840e+01 9.232e+01 9.628e+01 1.042e+02 1.260e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-28 23:05:34,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3707473.3333333335, ans=0.2 2023-11-28 23:05:42,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3707540.0, ans=0.125 2023-11-28 23:05:50,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3707540.0, ans=0.2 2023-11-28 23:06:01,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3707606.6666666665, ans=0.125 2023-11-28 23:06:02,538 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556150 2023-11-28 23:06:05,941 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3050, loss[loss=0.0663, simple_loss=0.08401, pruned_loss=0.01204, audio_tagging_loss=0.01225, over 13787.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08914, pruned_loss=0.01213, audio_tagging_loss=0.008717, over 3045694.90 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:06:29,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3707806.6666666665, ans=0.0 2023-11-28 23:06:44,955 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:06:45,591 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.23 vs. limit=10.0 2023-11-28 23:06:59,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3707940.0, ans=0.0 2023-11-28 23:06:59,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3707940.0, ans=0.04949747468305833 2023-11-28 23:07:00,202 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2023-11-28 23:07:04,305 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556200 2023-11-28 23:07:04,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3707940.0, ans=0.1 2023-11-28 23:07:08,257 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3100, loss[loss=0.1218, simple_loss=0.1604, pruned_loss=0.03498, audio_tagging_loss=0.006651, over 15660.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08942, pruned_loss=0.01217, audio_tagging_loss=0.008761, over 3044537.93 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:07:23,339 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 9.064e+01 9.672e+01 1.048e+02 1.274e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 23:07:34,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3708140.0, ans=0.125 2023-11-28 23:08:05,142 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556250 2023-11-28 23:08:08,490 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3150, loss[loss=0.079, simple_loss=0.1098, pruned_loss=0.01528, audio_tagging_loss=0.008799, over 14912.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08885, pruned_loss=0.01205, audio_tagging_loss=0.008858, over 3040847.15 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:08:18,691 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.86 vs. limit=15.0 2023-11-28 23:08:27,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3708406.6666666665, ans=0.0 2023-11-28 23:08:34,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3708473.3333333335, ans=0.0 2023-11-28 23:08:52,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3708540.0, ans=0.0 2023-11-28 23:08:54,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3708540.0, ans=0.0 2023-11-28 23:09:07,554 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556300 2023-11-28 23:09:10,932 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3200, loss[loss=0.06302, simple_loss=0.08602, pruned_loss=0.01128, audio_tagging_loss=0.008732, over 15868.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08887, pruned_loss=0.01208, audio_tagging_loss=0.008956, over 3043568.38 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:09:12,700 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.42 vs. limit=6.0 2023-11-28 23:09:15,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3708673.3333333335, ans=0.1 2023-11-28 23:09:20,362 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.59 vs. limit=15.0 2023-11-28 23:09:22,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3708740.0, ans=0.2 2023-11-28 23:09:26,539 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.731e+01 9.590e+01 1.027e+02 1.409e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 23:09:59,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=3708940.0, ans=0.02 2023-11-28 23:10:09,019 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556350 2023-11-28 23:10:11,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3709006.6666666665, ans=0.125 2023-11-28 23:10:12,498 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3250, loss[loss=0.07759, simple_loss=0.1085, pruned_loss=0.01669, audio_tagging_loss=0.006659, over 14513.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08936, pruned_loss=0.01217, audio_tagging_loss=0.009024, over 3044787.63 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:10:21,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3709006.6666666665, ans=0.0 2023-11-28 23:10:29,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3709073.3333333335, ans=0.125 2023-11-28 23:10:42,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=15.0 2023-11-28 23:10:55,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3709206.6666666665, ans=0.125 2023-11-28 23:11:03,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3709273.3333333335, ans=0.035 2023-11-28 23:11:09,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3709273.3333333335, ans=0.125 2023-11-28 23:11:10,684 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556400 2023-11-28 23:11:14,550 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3300, loss[loss=0.05611, simple_loss=0.07722, pruned_loss=0.009998, audio_tagging_loss=0.007506, over 14331.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08954, pruned_loss=0.01216, audio_tagging_loss=0.009094, over 3040560.18 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:11:18,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3709340.0, ans=0.1 2023-11-28 23:11:21,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3709340.0, ans=0.125 2023-11-28 23:11:31,277 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.994e+01 9.101e+01 9.601e+01 1.014e+02 1.380e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 23:11:32,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3709406.6666666665, ans=0.125 2023-11-28 23:11:38,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3709473.3333333335, ans=0.125 2023-11-28 23:11:46,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3709473.3333333335, ans=0.0 2023-11-28 23:12:12,417 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556450 2023-11-28 23:12:16,479 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3350, loss[loss=0.07296, simple_loss=0.09304, pruned_loss=0.0125, audio_tagging_loss=0.01394, over 15539.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08999, pruned_loss=0.01222, audio_tagging_loss=0.008943, over 3049702.42 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:13:09,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3709940.0, ans=0.125 2023-11-28 23:13:10,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3709940.0, ans=0.1 2023-11-28 23:13:10,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3709940.0, ans=0.2 2023-11-28 23:13:14,752 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556500 2023-11-28 23:13:18,080 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3400, loss[loss=0.09664, simple_loss=0.1325, pruned_loss=0.0234, audio_tagging_loss=0.006973, over 15701.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.0904, pruned_loss=0.01224, audio_tagging_loss=0.008771, over 3045886.35 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:13:21,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3710006.6666666665, ans=0.125 2023-11-28 23:13:33,974 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 8.795e+01 9.500e+01 1.053e+02 1.456e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 23:14:03,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3710206.6666666665, ans=0.125 2023-11-28 23:14:05,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3710206.6666666665, ans=0.0 2023-11-28 23:14:06,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3710273.3333333335, ans=0.1 2023-11-28 23:14:11,117 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.31 vs. limit=15.0 2023-11-28 23:14:12,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3710273.3333333335, ans=0.015 2023-11-28 23:14:16,450 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556550 2023-11-28 23:14:19,893 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3450, loss[loss=0.06442, simple_loss=0.08707, pruned_loss=0.009905, audio_tagging_loss=0.01098, over 15795.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09037, pruned_loss=0.01217, audio_tagging_loss=0.008706, over 3052069.54 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:14:50,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3710473.3333333335, ans=0.0 2023-11-28 23:14:50,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3710473.3333333335, ans=0.0 2023-11-28 23:14:54,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3710473.3333333335, ans=0.0 2023-11-28 23:14:58,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3710540.0, ans=0.125 2023-11-28 23:15:17,592 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556600 2023-11-28 23:15:19,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3710606.6666666665, ans=0.125 2023-11-28 23:15:21,987 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3500, loss[loss=0.07388, simple_loss=0.09815, pruned_loss=0.01543, audio_tagging_loss=0.00938, over 14632.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09026, pruned_loss=0.0121, audio_tagging_loss=0.008633, over 3047004.06 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:15:29,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3710673.3333333335, ans=0.125 2023-11-28 23:15:29,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3710673.3333333335, ans=0.125 2023-11-28 23:15:29,961 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.96 vs. limit=10.0 2023-11-28 23:15:38,388 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 9.007e+01 9.535e+01 1.020e+02 1.277e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 23:15:54,022 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.67 vs. limit=22.5 2023-11-28 23:15:56,607 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:16:16,746 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.54 vs. limit=15.0 2023-11-28 23:16:20,365 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556650 2023-11-28 23:16:24,406 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3550, loss[loss=0.07838, simple_loss=0.1112, pruned_loss=0.01734, audio_tagging_loss=0.005419, over 15273.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08986, pruned_loss=0.01202, audio_tagging_loss=0.00864, over 3044688.70 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:16:25,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3711006.6666666665, ans=0.125 2023-11-28 23:16:32,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3711006.6666666665, ans=0.1 2023-11-28 23:16:37,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3711073.3333333335, ans=0.5 2023-11-28 23:16:52,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3711140.0, ans=0.0 2023-11-28 23:17:01,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3711206.6666666665, ans=0.025 2023-11-28 23:17:03,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3711206.6666666665, ans=0.125 2023-11-28 23:17:22,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3711273.3333333335, ans=0.125 2023-11-28 23:17:23,287 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556700 2023-11-28 23:17:26,728 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3600, loss[loss=0.04583, simple_loss=0.05586, pruned_loss=0.006342, audio_tagging_loss=0.01155, over 15096.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08933, pruned_loss=0.01206, audio_tagging_loss=0.008564, over 3044243.92 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:17:34,445 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2023-11-28 23:17:42,457 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.234e+01 8.750e+01 9.399e+01 1.010e+02 1.318e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-28 23:17:59,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3711473.3333333335, ans=0.0 2023-11-28 23:18:10,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3711540.0, ans=0.125 2023-11-28 23:18:23,479 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556750 2023-11-28 23:18:24,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3711606.6666666665, ans=0.2 2023-11-28 23:18:27,673 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3650, loss[loss=0.06902, simple_loss=0.09764, pruned_loss=0.0139, audio_tagging_loss=0.006305, over 14972.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08979, pruned_loss=0.01217, audio_tagging_loss=0.008501, over 3045108.46 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:18:36,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=12.0 2023-11-28 23:18:53,507 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.28 vs. limit=12.0 2023-11-28 23:18:55,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3711806.6666666665, ans=0.0 2023-11-28 23:19:08,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3711873.3333333335, ans=0.1 2023-11-28 23:19:25,440 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556800 2023-11-28 23:19:29,803 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3700, loss[loss=0.0658, simple_loss=0.08881, pruned_loss=0.009185, audio_tagging_loss=0.01221, over 15534.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08987, pruned_loss=0.01223, audio_tagging_loss=0.008533, over 3046940.53 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:19:45,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3712073.3333333335, ans=0.125 2023-11-28 23:19:47,494 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.675e+01 8.931e+01 9.622e+01 1.040e+02 1.365e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-28 23:19:51,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3712073.3333333335, ans=0.2 2023-11-28 23:20:04,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3712140.0, ans=0.125 2023-11-28 23:20:10,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3712206.6666666665, ans=0.125 2023-11-28 23:20:28,763 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556850 2023-11-28 23:20:31,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3712340.0, ans=0.0 2023-11-28 23:20:32,158 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3750, loss[loss=0.0653, simple_loss=0.09379, pruned_loss=0.0115, audio_tagging_loss=0.006909, over 15070.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09126, pruned_loss=0.0124, audio_tagging_loss=0.008476, over 3052870.61 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:20:38,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3712340.0, ans=0.0 2023-11-28 23:20:40,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3712340.0, ans=0.2 2023-11-28 23:21:04,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3712473.3333333335, ans=0.2 2023-11-28 23:21:08,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3712540.0, ans=0.125 2023-11-28 23:21:16,897 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:21:18,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3712540.0, ans=0.0 2023-11-28 23:21:30,110 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556900 2023-11-28 23:21:33,581 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3800, loss[loss=0.05989, simple_loss=0.082, pruned_loss=0.01148, audio_tagging_loss=0.007408, over 13931.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08992, pruned_loss=0.01215, audio_tagging_loss=0.008535, over 3057116.79 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:21:45,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.23 vs. limit=15.0 2023-11-28 23:21:50,234 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.41 vs. limit=15.0 2023-11-28 23:21:52,346 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 9.439e+01 1.001e+02 1.076e+02 2.686e+02, threshold=2.002e+02, percent-clipped=1.0 2023-11-28 23:21:54,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3712740.0, ans=0.015 2023-11-28 23:22:14,205 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=12.0 2023-11-28 23:22:19,034 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.16 vs. limit=15.0 2023-11-28 23:22:31,893 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556950 2023-11-28 23:22:35,447 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3850, loss[loss=0.07223, simple_loss=0.08773, pruned_loss=0.01832, audio_tagging_loss=0.01005, over 13828.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08978, pruned_loss=0.01208, audio_tagging_loss=0.008615, over 3051123.33 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:22:35,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3713006.6666666665, ans=0.09899494936611666 2023-11-28 23:22:47,951 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2023-11-28 23:22:58,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3713073.3333333335, ans=0.0 2023-11-28 23:23:04,667 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.31 vs. limit=15.0 2023-11-28 23:23:23,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3713273.3333333335, ans=0.1 2023-11-28 23:23:33,682 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557000 2023-11-28 23:23:38,037 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3900, loss[loss=0.06899, simple_loss=0.09717, pruned_loss=0.01382, audio_tagging_loss=0.006577, over 16086.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.0897, pruned_loss=0.01204, audio_tagging_loss=0.008626, over 3048948.36 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:23:53,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3713406.6666666665, ans=0.0 2023-11-28 23:23:55,792 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.114e+01 8.938e+01 9.522e+01 1.035e+02 1.409e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 23:24:02,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3713473.3333333335, ans=0.2 2023-11-28 23:24:34,887 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557050 2023-11-28 23:24:38,301 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3950, loss[loss=0.05988, simple_loss=0.08513, pruned_loss=0.009781, audio_tagging_loss=0.007535, over 14659.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08955, pruned_loss=0.0121, audio_tagging_loss=0.008725, over 3041567.45 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:24:43,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3713673.3333333335, ans=0.2 2023-11-28 23:24:48,833 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:25:01,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3713740.0, ans=0.0 2023-11-28 23:25:01,738 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=15.0 2023-11-28 23:25:13,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3713806.6666666665, ans=0.1 2023-11-28 23:25:23,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3713873.3333333335, ans=0.0 2023-11-28 23:25:37,811 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557100 2023-11-28 23:25:39,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3713940.0, ans=0.2 2023-11-28 23:25:41,363 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4000, loss[loss=0.056, simple_loss=0.07276, pruned_loss=0.00775, audio_tagging_loss=0.01187, over 14652.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.0894, pruned_loss=0.01196, audio_tagging_loss=0.008826, over 3035470.65 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:25:46,922 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.85 vs. limit=15.0 2023-11-28 23:25:59,957 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 8.920e+01 9.493e+01 1.035e+02 1.641e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 23:26:22,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3714206.6666666665, ans=0.2 2023-11-28 23:26:23,719 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:26:38,975 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557150 2023-11-28 23:26:43,000 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4050, loss[loss=0.06038, simple_loss=0.08493, pruned_loss=0.01126, audio_tagging_loss=0.006653, over 14941.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08975, pruned_loss=0.01227, audio_tagging_loss=0.008834, over 3034578.62 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:26:47,694 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:26:50,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3714340.0, ans=0.1 2023-11-28 23:27:18,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3714540.0, ans=0.0 2023-11-28 23:27:22,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3714540.0, ans=0.0 2023-11-28 23:27:41,455 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557200 2023-11-28 23:27:41,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3714606.6666666665, ans=0.0 2023-11-28 23:27:43,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3714606.6666666665, ans=0.0 2023-11-28 23:27:45,244 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4100, loss[loss=0.05952, simple_loss=0.07364, pruned_loss=0.009855, audio_tagging_loss=0.01284, over 15511.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08933, pruned_loss=0.01211, audio_tagging_loss=0.008884, over 3039970.29 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:27:45,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3714673.3333333335, ans=0.1 2023-11-28 23:27:59,604 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.19 vs. limit=15.0 2023-11-28 23:28:00,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3714740.0, ans=0.0 2023-11-28 23:28:03,239 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.823e+01 9.138e+01 9.541e+01 1.028e+02 1.498e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 23:28:18,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3714806.6666666665, ans=0.0 2023-11-28 23:28:21,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3714873.3333333335, ans=0.125 2023-11-28 23:28:29,185 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.80 vs. limit=15.0 2023-11-28 23:28:31,563 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.61 vs. limit=22.5 2023-11-28 23:28:43,514 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557250 2023-11-28 23:28:46,805 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4150, loss[loss=0.053, simple_loss=0.07426, pruned_loss=0.006001, audio_tagging_loss=0.00987, over 15245.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08969, pruned_loss=0.01216, audio_tagging_loss=0.008734, over 3038187.86 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:28:48,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3715006.6666666665, ans=0.0 2023-11-28 23:29:02,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3715073.3333333335, ans=0.125 2023-11-28 23:29:03,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3715073.3333333335, ans=0.125 2023-11-28 23:29:26,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3715206.6666666665, ans=0.125 2023-11-28 23:29:33,575 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:29:36,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.12 vs. limit=22.5 2023-11-28 23:29:44,753 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557300 2023-11-28 23:29:47,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3715340.0, ans=0.125 2023-11-28 23:29:48,186 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4200, loss[loss=0.1153, simple_loss=0.1581, pruned_loss=0.02989, audio_tagging_loss=0.006345, over 14787.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08963, pruned_loss=0.01222, audio_tagging_loss=0.008609, over 3038722.04 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:29:55,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3715340.0, ans=0.015 2023-11-28 23:30:03,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3715406.6666666665, ans=0.125 2023-11-28 23:30:06,724 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.169e+01 8.859e+01 9.416e+01 1.036e+02 1.524e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 23:30:08,594 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.49 vs. limit=22.5 2023-11-28 23:30:09,643 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.08 vs. limit=12.0 2023-11-28 23:30:12,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3715473.3333333335, ans=0.1 2023-11-28 23:30:20,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3715473.3333333335, ans=0.0 2023-11-28 23:30:27,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=15.0 2023-11-28 23:30:36,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3715606.6666666665, ans=0.1 2023-11-28 23:30:46,490 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557350 2023-11-28 23:30:49,989 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4250, loss[loss=0.0857, simple_loss=0.1283, pruned_loss=0.01507, audio_tagging_loss=0.006485, over 15583.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08932, pruned_loss=0.01204, audio_tagging_loss=0.008587, over 3040125.87 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:30:57,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3715673.3333333335, ans=0.2 2023-11-28 23:31:11,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3715740.0, ans=0.1 2023-11-28 23:31:40,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3715940.0, ans=0.1 2023-11-28 23:31:47,708 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557400 2023-11-28 23:31:50,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3716006.6666666665, ans=0.0 2023-11-28 23:31:51,616 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4300, loss[loss=0.07321, simple_loss=0.09955, pruned_loss=0.01432, audio_tagging_loss=0.009116, over 15264.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.0891, pruned_loss=0.01193, audio_tagging_loss=0.008547, over 3042537.71 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:32:09,694 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 9.110e+01 9.607e+01 1.023e+02 1.243e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 23:32:09,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3716073.3333333335, ans=0.1 2023-11-28 23:32:14,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3716073.3333333335, ans=0.125 2023-11-28 23:32:23,989 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=12.0 2023-11-28 23:32:40,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3716273.3333333335, ans=0.0 2023-11-28 23:32:49,439 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557450 2023-11-28 23:32:53,490 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4350, loss[loss=0.0548, simple_loss=0.06591, pruned_loss=0.01095, audio_tagging_loss=0.01089, over 14361.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.09002, pruned_loss=0.01237, audio_tagging_loss=0.008573, over 3037916.67 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:33:30,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=15.0 2023-11-28 23:33:43,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3716606.6666666665, ans=0.1 2023-11-28 23:33:47,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3716606.6666666665, ans=0.125 2023-11-28 23:33:52,135 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557500 2023-11-28 23:33:55,552 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4400, loss[loss=0.08177, simple_loss=0.1119, pruned_loss=0.01762, audio_tagging_loss=0.008218, over 15499.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09036, pruned_loss=0.01243, audio_tagging_loss=0.008499, over 3040307.68 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:34:02,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3716673.3333333335, ans=0.125 2023-11-28 23:34:09,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3716740.0, ans=0.0 2023-11-28 23:34:10,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3716740.0, ans=0.1 2023-11-28 23:34:15,634 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.931e+01 9.001e+01 9.645e+01 1.064e+02 1.630e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-28 23:34:19,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3716806.6666666665, ans=0.2 2023-11-28 23:34:54,010 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557550 2023-11-28 23:34:55,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3716940.0, ans=0.09899494936611666 2023-11-28 23:34:56,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3717006.6666666665, ans=0.125 2023-11-28 23:34:57,367 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4450, loss[loss=0.06301, simple_loss=0.09347, pruned_loss=0.01018, audio_tagging_loss=0.006094, over 14576.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08938, pruned_loss=0.01221, audio_tagging_loss=0.008498, over 3038872.46 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:35:53,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.80 vs. limit=10.0 2023-11-28 23:35:55,835 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557600 2023-11-28 23:36:00,236 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4500, loss[loss=0.05595, simple_loss=0.08213, pruned_loss=0.006179, audio_tagging_loss=0.008713, over 15907.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.09003, pruned_loss=0.01227, audio_tagging_loss=0.008484, over 3044678.87 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:36:03,162 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.75 vs. limit=15.0 2023-11-28 23:36:09,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3717340.0, ans=0.0 2023-11-28 23:36:19,791 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.691e+01 8.979e+01 9.760e+01 1.042e+02 1.445e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-28 23:36:25,481 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.67 vs. limit=15.0 2023-11-28 23:36:27,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3717473.3333333335, ans=0.0 2023-11-28 23:36:58,534 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557650 2023-11-28 23:37:02,012 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4550, loss[loss=0.04707, simple_loss=0.05507, pruned_loss=0.009703, audio_tagging_loss=0.009836, over 16375.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08998, pruned_loss=0.01219, audio_tagging_loss=0.00846, over 3042321.13 frames. ], batch size: 63, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:37:23,435 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.14 vs. limit=10.0 2023-11-28 23:37:25,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3717806.6666666665, ans=0.2 2023-11-28 23:37:47,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.86 vs. limit=15.0 2023-11-28 23:37:48,993 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.02 vs. limit=15.0 2023-11-28 23:37:49,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3717940.0, ans=0.2 2023-11-28 23:37:49,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3717940.0, ans=0.0 2023-11-28 23:37:50,852 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:37:58,994 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557700 2023-11-28 23:38:02,409 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4600, loss[loss=0.08507, simple_loss=0.1184, pruned_loss=0.01792, audio_tagging_loss=0.007927, over 15158.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09053, pruned_loss=0.01239, audio_tagging_loss=0.008379, over 3041729.31 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:38:07,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.35 vs. limit=12.0 2023-11-28 23:38:09,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3718006.6666666665, ans=0.125 2023-11-28 23:38:22,950 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.797e+01 9.011e+01 9.487e+01 1.017e+02 1.254e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 23:38:24,585 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:38:31,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3718140.0, ans=0.125 2023-11-28 23:39:01,215 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557750 2023-11-28 23:39:04,621 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4650, loss[loss=0.071, simple_loss=0.09865, pruned_loss=0.01469, audio_tagging_loss=0.00699, over 15098.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08997, pruned_loss=0.01231, audio_tagging_loss=0.008573, over 3046477.52 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:39:10,504 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2023-11-28 23:39:31,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3718473.3333333335, ans=0.125 2023-11-28 23:39:35,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3718473.3333333335, ans=0.125 2023-11-28 23:39:36,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3718473.3333333335, ans=0.125 2023-11-28 23:39:43,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3718540.0, ans=0.0 2023-11-28 23:39:47,910 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2023-11-28 23:39:57,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3718606.6666666665, ans=0.2 2023-11-28 23:39:58,825 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.97 vs. limit=10.0 2023-11-28 23:40:02,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3718606.6666666665, ans=0.125 2023-11-28 23:40:03,810 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557800 2023-11-28 23:40:07,644 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4700, loss[loss=0.06302, simple_loss=0.08325, pruned_loss=0.01004, audio_tagging_loss=0.01136, over 15199.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.0897, pruned_loss=0.01231, audio_tagging_loss=0.008726, over 3044192.98 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:40:22,138 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-11-28 23:40:25,991 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.766e+01 9.230e+01 9.778e+01 1.067e+02 1.457e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-28 23:40:29,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3718740.0, ans=0.0 2023-11-28 23:40:39,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3718806.6666666665, ans=0.0 2023-11-28 23:41:04,920 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557850 2023-11-28 23:41:08,269 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4750, loss[loss=0.04841, simple_loss=0.05928, pruned_loss=0.009055, audio_tagging_loss=0.009717, over 15002.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08924, pruned_loss=0.01217, audio_tagging_loss=0.008796, over 3051903.78 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:41:11,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3719006.6666666665, ans=0.0 2023-11-28 23:41:29,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3719073.3333333335, ans=0.125 2023-11-28 23:41:29,495 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.83 vs. limit=15.0 2023-11-28 23:41:58,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3719273.3333333335, ans=0.2 2023-11-28 23:42:06,325 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557900 2023-11-28 23:42:10,521 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4800, loss[loss=0.06011, simple_loss=0.07991, pruned_loss=0.009473, audio_tagging_loss=0.01068, over 15449.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.0892, pruned_loss=0.01213, audio_tagging_loss=0.008918, over 3050927.59 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:42:20,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3719340.0, ans=0.07 2023-11-28 23:42:29,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3719406.6666666665, ans=0.5 2023-11-28 23:42:30,262 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.346e+01 9.011e+01 9.522e+01 1.013e+02 1.336e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 23:42:40,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2023-11-28 23:43:03,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3719606.6666666665, ans=0.0 2023-11-28 23:43:09,223 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557950 2023-11-28 23:43:09,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3719606.6666666665, ans=0.0 2023-11-28 23:43:12,641 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4850, loss[loss=0.06043, simple_loss=0.08346, pruned_loss=0.008101, audio_tagging_loss=0.0106, over 17022.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09006, pruned_loss=0.0123, audio_tagging_loss=0.008955, over 3056220.17 frames. ], batch size: 66, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:44:10,714 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558000 2023-11-28 23:44:14,579 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4900, loss[loss=0.05961, simple_loss=0.09043, pruned_loss=0.006311, audio_tagging_loss=0.00808, over 15834.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09042, pruned_loss=0.01228, audio_tagging_loss=0.008784, over 3053123.65 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:44:19,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3720006.6666666665, ans=0.125 2023-11-28 23:44:26,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3720073.3333333335, ans=0.0 2023-11-28 23:44:35,706 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.818e+01 9.390e+01 1.021e+02 1.310e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 23:44:58,642 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.08 vs. limit=22.5 2023-11-28 23:45:00,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3720206.6666666665, ans=0.2 2023-11-28 23:45:02,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3720273.3333333335, ans=0.125 2023-11-28 23:45:05,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3720273.3333333335, ans=0.0 2023-11-28 23:45:12,788 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558050 2023-11-28 23:45:16,102 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4950, loss[loss=0.06969, simple_loss=0.1004, pruned_loss=0.01082, audio_tagging_loss=0.008696, over 14604.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09046, pruned_loss=0.01215, audio_tagging_loss=0.008665, over 3054855.48 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:45:28,151 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=15.0 2023-11-28 23:45:42,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3720473.3333333335, ans=0.0 2023-11-28 23:45:42,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.73 vs. limit=15.0 2023-11-28 23:46:12,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3720606.6666666665, ans=0.0 2023-11-28 23:46:14,406 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558100 2023-11-28 23:46:17,695 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.62 vs. limit=15.0 2023-11-28 23:46:18,289 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5000, loss[loss=0.05654, simple_loss=0.07988, pruned_loss=0.01006, audio_tagging_loss=0.006538, over 15074.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09038, pruned_loss=0.01218, audio_tagging_loss=0.008533, over 3051774.68 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:46:23,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.33 vs. limit=15.0 2023-11-28 23:46:29,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3720740.0, ans=0.1 2023-11-28 23:46:31,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3720740.0, ans=0.0 2023-11-28 23:46:38,214 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.963e+01 9.566e+01 1.007e+02 2.358e+02, threshold=1.913e+02, percent-clipped=1.0 2023-11-28 23:46:42,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3720806.6666666665, ans=0.1 2023-11-28 23:46:50,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3720806.6666666665, ans=0.1 2023-11-28 23:46:55,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3720873.3333333335, ans=0.0 2023-11-28 23:47:02,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2023-11-28 23:47:09,493 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.60 vs. limit=15.0 2023-11-28 23:47:15,726 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558150 2023-11-28 23:47:19,190 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5050, loss[loss=0.05911, simple_loss=0.08475, pruned_loss=0.01059, audio_tagging_loss=0.006138, over 16281.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08997, pruned_loss=0.01215, audio_tagging_loss=0.008461, over 3053149.74 frames. ], batch size: 63, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:47:21,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3721006.6666666665, ans=0.125 2023-11-28 23:47:27,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3721006.6666666665, ans=0.0 2023-11-28 23:47:59,872 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=12.0 2023-11-28 23:48:13,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3721273.3333333335, ans=0.0 2023-11-28 23:48:14,849 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=15.0 2023-11-28 23:48:15,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3721273.3333333335, ans=0.125 2023-11-28 23:48:16,758 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558200 2023-11-28 23:48:21,052 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5100, loss[loss=0.06866, simple_loss=0.0997, pruned_loss=0.01294, audio_tagging_loss=0.005876, over 15882.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.09027, pruned_loss=0.01234, audio_tagging_loss=0.008413, over 3056902.44 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:48:21,808 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.60 vs. limit=22.5 2023-11-28 23:48:22,924 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.86 vs. limit=15.0 2023-11-28 23:48:24,133 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2023-11-28 23:48:25,038 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=15.0 2023-11-28 23:48:39,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3721406.6666666665, ans=0.125 2023-11-28 23:48:44,100 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.897e+01 9.648e+01 1.044e+02 1.353e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 23:48:45,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3721473.3333333335, ans=0.95 2023-11-28 23:48:49,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3721473.3333333335, ans=0.125 2023-11-28 23:48:49,348 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.55 vs. limit=22.5 2023-11-28 23:48:57,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.22 vs. limit=22.5 2023-11-28 23:49:16,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3721606.6666666665, ans=0.0 2023-11-28 23:49:18,562 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558250 2023-11-28 23:49:21,924 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5150, loss[loss=0.05419, simple_loss=0.07245, pruned_loss=0.009434, audio_tagging_loss=0.008535, over 15804.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08934, pruned_loss=0.01211, audio_tagging_loss=0.008398, over 3058736.40 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:49:27,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3721673.3333333335, ans=0.09899494936611666 2023-11-28 23:49:46,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3721806.6666666665, ans=0.0 2023-11-28 23:50:00,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3721873.3333333335, ans=0.125 2023-11-28 23:50:21,707 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558300 2023-11-28 23:50:23,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3721940.0, ans=0.125 2023-11-28 23:50:23,572 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.60 vs. limit=12.0 2023-11-28 23:50:25,159 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5200, loss[loss=0.069, simple_loss=0.08382, pruned_loss=0.01547, audio_tagging_loss=0.01162, over 15425.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08928, pruned_loss=0.01208, audio_tagging_loss=0.008476, over 3056620.39 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:50:25,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=3722006.6666666665, ans=15.0 2023-11-28 23:50:45,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3722073.3333333335, ans=10.0 2023-11-28 23:50:46,816 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.382e+01 9.041e+01 9.653e+01 1.034e+02 1.419e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-28 23:51:02,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3722206.6666666665, ans=0.125 2023-11-28 23:51:20,962 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.83 vs. limit=15.0 2023-11-28 23:51:22,597 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558350 2023-11-28 23:51:26,666 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5250, loss[loss=0.05432, simple_loss=0.07452, pruned_loss=0.008113, audio_tagging_loss=0.008951, over 16511.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08972, pruned_loss=0.01218, audio_tagging_loss=0.00849, over 3057769.82 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:51:52,682 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.95 vs. limit=10.0 2023-11-28 23:51:58,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3722473.3333333335, ans=0.125 2023-11-28 23:51:58,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=3722473.3333333335, ans=22.5 2023-11-28 23:52:06,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3722540.0, ans=0.125 2023-11-28 23:52:06,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3722540.0, ans=0.0 2023-11-28 23:52:12,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3722540.0, ans=0.125 2023-11-28 23:52:17,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3722606.6666666665, ans=0.1 2023-11-28 23:52:18,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3722606.6666666665, ans=0.0 2023-11-28 23:52:23,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3722606.6666666665, ans=0.2 2023-11-28 23:52:24,370 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558400 2023-11-28 23:52:28,351 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5300, loss[loss=0.09547, simple_loss=0.1405, pruned_loss=0.02072, audio_tagging_loss=0.004494, over 16671.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08913, pruned_loss=0.01208, audio_tagging_loss=0.008518, over 3048967.23 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:52:50,445 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 9.120e+01 9.836e+01 1.047e+02 1.238e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-28 23:52:59,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3722806.6666666665, ans=0.0 2023-11-28 23:53:07,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3722873.3333333335, ans=10.0 2023-11-28 23:53:08,433 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.91 vs. limit=15.0 2023-11-28 23:53:26,212 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558450 2023-11-28 23:53:27,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3722940.0, ans=0.125 2023-11-28 23:53:30,166 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5350, loss[loss=0.05708, simple_loss=0.0738, pruned_loss=0.008983, audio_tagging_loss=0.0112, over 14647.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08925, pruned_loss=0.01215, audio_tagging_loss=0.008572, over 3043600.72 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:53:33,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3723006.6666666665, ans=0.2 2023-11-28 23:53:39,248 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.85 vs. limit=12.0 2023-11-28 23:53:58,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3723140.0, ans=0.125 2023-11-28 23:54:05,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3723140.0, ans=0.0 2023-11-28 23:54:06,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3723206.6666666665, ans=0.1 2023-11-28 23:54:16,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3723206.6666666665, ans=0.125 2023-11-28 23:54:25,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3723273.3333333335, ans=0.1 2023-11-28 23:54:28,081 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558500 2023-11-28 23:54:31,518 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5400, loss[loss=0.07693, simple_loss=0.097, pruned_loss=0.01638, audio_tagging_loss=0.01205, over 15441.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.09047, pruned_loss=0.01217, audio_tagging_loss=0.008506, over 3041339.63 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:54:54,655 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.310e+01 8.983e+01 9.673e+01 1.019e+02 1.246e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 23:55:00,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3723473.3333333335, ans=0.0 2023-11-28 23:55:15,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3723540.0, ans=0.1 2023-11-28 23:55:21,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3723606.6666666665, ans=0.125 2023-11-28 23:55:27,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3723606.6666666665, ans=0.125 2023-11-28 23:55:29,948 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558550 2023-11-28 23:55:33,320 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5450, loss[loss=0.07389, simple_loss=0.1008, pruned_loss=0.01756, audio_tagging_loss=0.005918, over 14336.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08933, pruned_loss=0.01209, audio_tagging_loss=0.008591, over 3037442.89 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:55:47,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3723740.0, ans=0.1 2023-11-28 23:55:54,147 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.28 vs. limit=10.0 2023-11-28 23:56:05,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3723806.6666666665, ans=0.2 2023-11-28 23:56:05,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3723806.6666666665, ans=15.0 2023-11-28 23:56:06,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3723806.6666666665, ans=0.125 2023-11-28 23:56:14,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3723873.3333333335, ans=0.05 2023-11-28 23:56:18,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3723873.3333333335, ans=0.125 2023-11-28 23:56:27,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3723940.0, ans=0.2 2023-11-28 23:56:31,925 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558600 2023-11-28 23:56:35,657 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5500, loss[loss=0.05952, simple_loss=0.08245, pruned_loss=0.008875, audio_tagging_loss=0.009418, over 14273.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.0896, pruned_loss=0.01228, audio_tagging_loss=0.00864, over 3039178.95 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:56:49,563 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2023-11-28 23:56:57,832 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.947e+01 8.978e+01 9.679e+01 1.033e+02 1.249e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-28 23:56:59,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3724140.0, ans=0.2 2023-11-28 23:57:07,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3724140.0, ans=0.125 2023-11-28 23:57:18,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2023-11-28 23:57:20,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3724206.6666666665, ans=0.1 2023-11-28 23:57:32,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3724273.3333333335, ans=0.125 2023-11-28 23:57:33,473 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558650 2023-11-28 23:57:34,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3724273.3333333335, ans=0.0 2023-11-28 23:57:36,766 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5550, loss[loss=0.0732, simple_loss=0.1017, pruned_loss=0.0142, audio_tagging_loss=0.008143, over 14874.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08926, pruned_loss=0.01235, audio_tagging_loss=0.008781, over 3043705.55 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:57:36,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3724340.0, ans=0.1 2023-11-28 23:57:47,785 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.07 vs. limit=22.5 2023-11-28 23:57:54,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3724406.6666666665, ans=0.5 2023-11-28 23:58:16,731 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=22.5 2023-11-28 23:58:23,995 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:58:32,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3724606.6666666665, ans=0.0 2023-11-28 23:58:35,095 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558700 2023-11-28 23:58:38,525 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5600, loss[loss=0.04592, simple_loss=0.06053, pruned_loss=0.005908, audio_tagging_loss=0.009745, over 14335.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.089, pruned_loss=0.01225, audio_tagging_loss=0.008822, over 3053121.69 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:58:39,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3724673.3333333335, ans=0.125 2023-11-28 23:58:45,618 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2023-11-28 23:58:49,672 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.80 vs. limit=15.0 2023-11-28 23:59:00,583 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.68 vs. limit=15.0 2023-11-28 23:59:00,859 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 9.219e+01 9.778e+01 1.037e+02 1.295e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-28 23:59:01,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3724740.0, ans=0.125 2023-11-28 23:59:14,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3724873.3333333335, ans=0.125 2023-11-28 23:59:23,625 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.47 vs. limit=10.0 2023-11-28 23:59:24,234 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:59:37,089 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558750 2023-11-28 23:59:40,505 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5650, loss[loss=0.0539, simple_loss=0.07055, pruned_loss=0.005761, audio_tagging_loss=0.01286, over 16151.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08875, pruned_loss=0.01215, audio_tagging_loss=0.008867, over 3047857.33 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:59:53,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3725073.3333333335, ans=0.125 2023-11-28 23:59:54,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3725073.3333333335, ans=0.0 2023-11-29 00:00:05,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3725140.0, ans=0.125 2023-11-29 00:00:11,427 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=15.0 2023-11-29 00:00:26,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3725206.6666666665, ans=10.0 2023-11-29 00:00:37,934 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558800 2023-11-29 00:00:41,904 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5700, loss[loss=0.04929, simple_loss=0.06748, pruned_loss=0.005858, audio_tagging_loss=0.009697, over 14784.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08898, pruned_loss=0.0122, audio_tagging_loss=0.008775, over 3051024.45 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-29 00:00:45,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3725340.0, ans=0.0 2023-11-29 00:01:04,883 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.325e+01 8.841e+01 9.405e+01 1.014e+02 1.366e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-29 00:01:11,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3725473.3333333335, ans=0.125 2023-11-29 00:01:12,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3725473.3333333335, ans=0.0 2023-11-29 00:01:41,169 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558850 2023-11-29 00:01:44,649 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5750, loss[loss=0.07459, simple_loss=0.1027, pruned_loss=0.01398, audio_tagging_loss=0.009235, over 15422.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08895, pruned_loss=0.01222, audio_tagging_loss=0.008646, over 3046819.22 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-29 00:01:45,269 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.70 vs. limit=22.5 2023-11-29 00:01:46,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3725673.3333333335, ans=0.0 2023-11-29 00:02:00,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3725740.0, ans=0.2 2023-11-29 00:02:05,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3725740.0, ans=0.125 2023-11-29 00:02:19,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3725873.3333333335, ans=0.1 2023-11-29 00:02:24,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3725873.3333333335, ans=0.0 2023-11-29 00:02:24,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3725873.3333333335, ans=0.125 2023-11-29 00:02:29,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3725873.3333333335, ans=0.125 2023-11-29 00:02:42,740 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558900 2023-11-29 00:02:46,197 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5800, loss[loss=0.05272, simple_loss=0.06325, pruned_loss=0.01138, audio_tagging_loss=0.009722, over 15287.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08941, pruned_loss=0.01218, audio_tagging_loss=0.008491, over 3045581.56 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:02:58,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3726073.3333333335, ans=0.125 2023-11-29 00:03:08,467 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.900e+01 8.867e+01 9.470e+01 1.000e+02 1.681e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-29 00:03:09,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3726140.0, ans=0.0 2023-11-29 00:03:11,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3726140.0, ans=0.125 2023-11-29 00:03:20,425 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=12.0 2023-11-29 00:03:22,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3726206.6666666665, ans=0.0 2023-11-29 00:03:43,120 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558950 2023-11-29 00:03:46,488 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5850, loss[loss=0.06924, simple_loss=0.0961, pruned_loss=0.01445, audio_tagging_loss=0.006734, over 13900.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08931, pruned_loss=0.01221, audio_tagging_loss=0.008459, over 3038260.74 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:04:28,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3726540.0, ans=0.1 2023-11-29 00:04:34,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3726606.6666666665, ans=0.0 2023-11-29 00:04:40,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3726606.6666666665, ans=0.04949747468305833 2023-11-29 00:04:44,586 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559000 2023-11-29 00:04:49,133 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5900, loss[loss=0.08113, simple_loss=0.114, pruned_loss=0.01493, audio_tagging_loss=0.009196, over 15362.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08999, pruned_loss=0.01226, audio_tagging_loss=0.008442, over 3038513.66 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:04:49,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3726673.3333333335, ans=0.0 2023-11-29 00:04:54,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3726673.3333333335, ans=0.09899494936611666 2023-11-29 00:04:56,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3726673.3333333335, ans=0.125 2023-11-29 00:05:12,432 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.979e+01 9.571e+01 1.024e+02 1.288e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 00:05:21,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3726806.6666666665, ans=0.125 2023-11-29 00:05:28,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.80 vs. limit=22.5 2023-11-29 00:05:38,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3726940.0, ans=0.125 2023-11-29 00:05:47,313 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559050 2023-11-29 00:05:51,243 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5950, loss[loss=0.06124, simple_loss=0.08475, pruned_loss=0.008443, audio_tagging_loss=0.01042, over 15524.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08982, pruned_loss=0.01207, audio_tagging_loss=0.008449, over 3047312.26 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:06:03,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3727073.3333333335, ans=0.5 2023-11-29 00:06:13,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3727140.0, ans=0.125 2023-11-29 00:06:19,294 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.95 vs. limit=10.0 2023-11-29 00:06:22,552 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.73 vs. limit=22.5 2023-11-29 00:06:48,353 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559100 2023-11-29 00:06:51,707 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6000, loss[loss=0.05814, simple_loss=0.0841, pruned_loss=0.008926, audio_tagging_loss=0.007168, over 15443.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08963, pruned_loss=0.01196, audio_tagging_loss=0.008412, over 3043043.32 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-29 00:06:51,710 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-29 00:07:15,019 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.7400, 4.3318, 4.6790, 4.0992], device='cuda:0') 2023-11-29 00:07:31,877 INFO [train_asr.py:1267] (0/4) Epoch 47, validation: loss=0.05752, simple_loss=0.05049, pruned_loss=0.005333, audio_tagging_loss=0.02694, over 4681554.00 frames. 2023-11-29 00:07:31,877 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-29 00:07:52,254 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.63 vs. limit=15.0 2023-11-29 00:07:56,059 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.777e+01 9.062e+01 9.671e+01 1.050e+02 2.392e+02, threshold=1.934e+02, percent-clipped=1.0 2023-11-29 00:08:09,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3727540.0, ans=0.0 2023-11-29 00:08:09,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3727540.0, ans=0.1 2023-11-29 00:08:17,558 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 00:08:31,009 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559150 2023-11-29 00:08:34,404 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6050, loss[loss=0.06577, simple_loss=0.08347, pruned_loss=0.01225, audio_tagging_loss=0.01179, over 15357.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08926, pruned_loss=0.01202, audio_tagging_loss=0.00845, over 3045803.57 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-29 00:08:42,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3727673.3333333335, ans=0.125 2023-11-29 00:08:43,380 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.64 vs. limit=15.0 2023-11-29 00:08:53,293 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:09:10,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3727873.3333333335, ans=0.125 2023-11-29 00:09:29,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3727940.0, ans=0.125 2023-11-29 00:09:30,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3727940.0, ans=0.0 2023-11-29 00:09:31,134 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559200 2023-11-29 00:09:34,940 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6100, loss[loss=0.07186, simple_loss=0.09453, pruned_loss=0.01226, audio_tagging_loss=0.01234, over 16046.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08845, pruned_loss=0.01188, audio_tagging_loss=0.00848, over 3053315.05 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 32.0 2023-11-29 00:09:35,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3728006.6666666665, ans=0.125 2023-11-29 00:09:42,434 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=15.0 2023-11-29 00:09:45,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3728073.3333333335, ans=0.0 2023-11-29 00:09:49,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3728073.3333333335, ans=0.0 2023-11-29 00:09:57,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3728073.3333333335, ans=0.125 2023-11-29 00:09:57,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3728073.3333333335, ans=10.0 2023-11-29 00:09:58,312 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.264e+01 8.989e+01 9.555e+01 1.035e+02 1.326e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-29 00:10:02,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3728140.0, ans=0.2 2023-11-29 00:10:20,562 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.25 vs. limit=5.0 2023-11-29 00:10:31,670 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559250 2023-11-29 00:10:35,630 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6150, loss[loss=0.07708, simple_loss=0.1057, pruned_loss=0.01723, audio_tagging_loss=0.006977, over 14535.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08887, pruned_loss=0.01196, audio_tagging_loss=0.0085, over 3054605.92 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:10:44,922 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=12.0 2023-11-29 00:11:05,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3728473.3333333335, ans=0.0 2023-11-29 00:11:19,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3728540.0, ans=0.125 2023-11-29 00:11:31,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3728606.6666666665, ans=0.0 2023-11-29 00:11:33,948 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559300 2023-11-29 00:11:38,027 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6200, loss[loss=0.07824, simple_loss=0.1151, pruned_loss=0.01324, audio_tagging_loss=0.007472, over 15566.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08999, pruned_loss=0.01237, audio_tagging_loss=0.008568, over 3054491.23 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:11:47,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3728673.3333333335, ans=0.0 2023-11-29 00:11:53,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-29 00:11:55,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3728740.0, ans=0.0 2023-11-29 00:12:01,264 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.926e+01 8.945e+01 9.631e+01 1.031e+02 1.323e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 00:12:18,368 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2023-11-29 00:12:35,574 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559350 2023-11-29 00:12:39,050 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6250, loss[loss=0.06653, simple_loss=0.09445, pruned_loss=0.00979, audio_tagging_loss=0.00952, over 14926.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08955, pruned_loss=0.01229, audio_tagging_loss=0.008628, over 3052893.64 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:12:39,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3729006.6666666665, ans=0.0 2023-11-29 00:12:54,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3729073.3333333335, ans=0.0 2023-11-29 00:13:05,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3729140.0, ans=0.1 2023-11-29 00:13:27,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3729273.3333333335, ans=0.125 2023-11-29 00:13:36,063 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559400 2023-11-29 00:13:37,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3729273.3333333335, ans=0.0 2023-11-29 00:13:39,777 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6300, loss[loss=0.07872, simple_loss=0.1187, pruned_loss=0.01252, audio_tagging_loss=0.006828, over 16135.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09038, pruned_loss=0.01229, audio_tagging_loss=0.00874, over 3054439.78 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:13:41,373 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=15.0 2023-11-29 00:14:02,173 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=15.0 2023-11-29 00:14:06,088 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.910e+01 9.740e+01 1.040e+02 1.205e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-29 00:14:19,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3729540.0, ans=0.2 2023-11-29 00:14:24,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3729540.0, ans=0.125 2023-11-29 00:14:39,755 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559450 2023-11-29 00:14:41,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3729606.6666666665, ans=0.125 2023-11-29 00:14:43,943 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6350, loss[loss=0.03868, simple_loss=0.04779, pruned_loss=0.003828, audio_tagging_loss=0.01096, over 14168.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09055, pruned_loss=0.01225, audio_tagging_loss=0.008756, over 3058348.41 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:14:56,100 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.96 vs. limit=15.0 2023-11-29 00:15:15,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3729806.6666666665, ans=0.125 2023-11-29 00:15:19,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3729873.3333333335, ans=0.2 2023-11-29 00:15:24,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3729873.3333333335, ans=0.2 2023-11-29 00:15:25,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3729873.3333333335, ans=0.5 2023-11-29 00:15:30,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3729873.3333333335, ans=0.05 2023-11-29 00:15:38,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3729940.0, ans=0.0 2023-11-29 00:15:42,321 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559500 2023-11-29 00:15:45,792 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6400, loss[loss=0.06591, simple_loss=0.08958, pruned_loss=0.01246, audio_tagging_loss=0.008661, over 14911.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08961, pruned_loss=0.01204, audio_tagging_loss=0.008941, over 3048937.97 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:15:53,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3730006.6666666665, ans=0.0 2023-11-29 00:16:03,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3730073.3333333335, ans=0.125 2023-11-29 00:16:10,786 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.783e+01 8.936e+01 9.646e+01 1.038e+02 1.369e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 00:16:43,599 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559550 2023-11-29 00:16:47,003 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6450, loss[loss=0.06685, simple_loss=0.0912, pruned_loss=0.01229, audio_tagging_loss=0.008963, over 16299.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08982, pruned_loss=0.0121, audio_tagging_loss=0.009016, over 3049667.68 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:16:49,883 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.30 vs. limit=22.5 2023-11-29 00:17:11,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3730473.3333333335, ans=0.04949747468305833 2023-11-29 00:17:25,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3730540.0, ans=0.2 2023-11-29 00:17:30,678 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2023-11-29 00:17:31,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3730540.0, ans=0.0 2023-11-29 00:17:38,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3730606.6666666665, ans=0.1 2023-11-29 00:17:46,294 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559600 2023-11-29 00:17:50,023 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=15.0 2023-11-29 00:17:50,658 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6500, loss[loss=0.06977, simple_loss=0.09314, pruned_loss=0.0116, audio_tagging_loss=0.0116, over 14959.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08981, pruned_loss=0.01216, audio_tagging_loss=0.008961, over 3047916.23 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:18:05,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3730740.0, ans=0.125 2023-11-29 00:18:16,527 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.806e+01 9.149e+01 9.988e+01 1.072e+02 1.426e+02, threshold=1.998e+02, percent-clipped=0.0 2023-11-29 00:18:33,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3730873.3333333335, ans=0.1 2023-11-29 00:18:49,105 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559650 2023-11-29 00:18:52,570 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6550, loss[loss=0.09215, simple_loss=0.121, pruned_loss=0.02553, audio_tagging_loss=0.006106, over 15001.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.0896, pruned_loss=0.01213, audio_tagging_loss=0.00886, over 3049453.41 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:18:52,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3731006.6666666665, ans=0.125 2023-11-29 00:18:53,386 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.26 vs. limit=22.5 2023-11-29 00:19:03,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3731006.6666666665, ans=0.2 2023-11-29 00:19:10,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3731073.3333333335, ans=0.125 2023-11-29 00:19:27,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3731140.0, ans=0.125 2023-11-29 00:19:28,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3731140.0, ans=0.0 2023-11-29 00:19:41,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3731273.3333333335, ans=0.0 2023-11-29 00:19:51,039 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559700 2023-11-29 00:19:54,508 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6600, loss[loss=0.06787, simple_loss=0.08781, pruned_loss=0.01509, audio_tagging_loss=0.008872, over 14840.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08961, pruned_loss=0.01206, audio_tagging_loss=0.008706, over 3047063.17 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:20:18,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3731473.3333333335, ans=0.015 2023-11-29 00:20:19,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3731473.3333333335, ans=0.1 2023-11-29 00:20:20,705 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.919e+01 9.465e+01 1.014e+02 1.286e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-29 00:20:49,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3731606.6666666665, ans=0.125 2023-11-29 00:20:52,602 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559750 2023-11-29 00:20:52,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3731606.6666666665, ans=0.125 2023-11-29 00:20:56,672 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6650, loss[loss=0.06896, simple_loss=0.09863, pruned_loss=0.01114, audio_tagging_loss=0.008502, over 15259.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08899, pruned_loss=0.01186, audio_tagging_loss=0.008705, over 3050246.77 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:21:04,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3731673.3333333335, ans=0.125 2023-11-29 00:21:09,624 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=12.0 2023-11-29 00:21:13,875 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:21:33,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3731873.3333333335, ans=0.125 2023-11-29 00:21:42,791 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=22.5 2023-11-29 00:21:46,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3731940.0, ans=0.125 2023-11-29 00:21:54,818 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559800 2023-11-29 00:21:58,731 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6700, loss[loss=0.06847, simple_loss=0.08895, pruned_loss=0.0159, audio_tagging_loss=0.008092, over 15235.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08964, pruned_loss=0.01203, audio_tagging_loss=0.00858, over 3048025.82 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:22:24,739 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.843e+01 9.131e+01 9.664e+01 1.036e+02 1.396e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-29 00:22:53,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3732273.3333333335, ans=10.0 2023-11-29 00:22:56,152 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559850 2023-11-29 00:22:59,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.94 vs. limit=22.5 2023-11-29 00:22:59,576 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6750, loss[loss=0.0755, simple_loss=0.1006, pruned_loss=0.01447, audio_tagging_loss=0.01071, over 15681.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08913, pruned_loss=0.01192, audio_tagging_loss=0.008558, over 3038457.75 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:23:10,718 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.29 vs. limit=15.0 2023-11-29 00:23:23,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3732406.6666666665, ans=0.0 2023-11-29 00:23:27,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3732473.3333333335, ans=0.2 2023-11-29 00:23:39,988 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.07 vs. limit=15.0 2023-11-29 00:23:45,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3732540.0, ans=0.0 2023-11-29 00:23:46,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3732540.0, ans=0.125 2023-11-29 00:23:47,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3732606.6666666665, ans=0.0 2023-11-29 00:23:58,414 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559900 2023-11-29 00:24:01,816 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6800, loss[loss=0.07271, simple_loss=0.1065, pruned_loss=0.01357, audio_tagging_loss=0.005877, over 15770.00 frames. ], tot_loss[loss=0.06423, simple_loss=0.08823, pruned_loss=0.01161, audio_tagging_loss=0.008506, over 3038173.84 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:24:07,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3732673.3333333335, ans=0.1 2023-11-29 00:24:20,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3732740.0, ans=0.1 2023-11-29 00:24:27,575 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.798e+01 9.091e+01 9.725e+01 1.038e+02 3.036e+02, threshold=1.945e+02, percent-clipped=1.0 2023-11-29 00:24:36,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3732806.6666666665, ans=0.125 2023-11-29 00:24:40,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3732873.3333333335, ans=0.125 2023-11-29 00:24:45,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3732873.3333333335, ans=0.0 2023-11-29 00:24:55,063 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2023-11-29 00:25:00,666 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559950 2023-11-29 00:25:01,167 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2023-11-29 00:25:04,111 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6850, loss[loss=0.06778, simple_loss=0.09807, pruned_loss=0.01029, audio_tagging_loss=0.008455, over 16072.00 frames. ], tot_loss[loss=0.06415, simple_loss=0.0882, pruned_loss=0.01157, audio_tagging_loss=0.008483, over 3035886.72 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:25:09,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3733006.6666666665, ans=0.125 2023-11-29 00:25:11,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3733006.6666666665, ans=0.125 2023-11-29 00:25:15,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3733073.3333333335, ans=0.125 2023-11-29 00:25:17,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3733073.3333333335, ans=0.125 2023-11-29 00:25:24,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3733073.3333333335, ans=0.07 2023-11-29 00:25:36,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3733140.0, ans=0.125 2023-11-29 00:25:50,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3733206.6666666665, ans=0.125 2023-11-29 00:25:55,707 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=22.5 2023-11-29 00:26:02,176 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560000 2023-11-29 00:26:02,799 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2023-11-29 00:26:03,563 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-560000.pt 2023-11-29 00:26:08,443 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6900, loss[loss=0.07038, simple_loss=0.09602, pruned_loss=0.0139, audio_tagging_loss=0.008475, over 14748.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08905, pruned_loss=0.01179, audio_tagging_loss=0.008515, over 3039591.95 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:26:20,332 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.26 vs. limit=6.0 2023-11-29 00:26:36,701 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.618e+01 8.734e+01 9.536e+01 1.026e+02 1.241e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-29 00:26:40,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3733473.3333333335, ans=0.125 2023-11-29 00:26:43,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3733473.3333333335, ans=0.2 2023-11-29 00:26:46,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3733540.0, ans=0.2 2023-11-29 00:26:49,024 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.55 vs. limit=15.0 2023-11-29 00:26:52,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3733540.0, ans=0.125 2023-11-29 00:26:57,773 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 00:26:59,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3733606.6666666665, ans=0.2 2023-11-29 00:27:05,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3733606.6666666665, ans=0.125 2023-11-29 00:27:06,920 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560050 2023-11-29 00:27:10,746 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6950, loss[loss=0.06319, simple_loss=0.0885, pruned_loss=0.01291, audio_tagging_loss=0.006039, over 15052.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.09014, pruned_loss=0.01187, audio_tagging_loss=0.008466, over 3038608.62 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:27:31,242 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.59 vs. limit=15.0 2023-11-29 00:27:42,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3733806.6666666665, ans=0.0 2023-11-29 00:27:47,697 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.28 vs. limit=6.0 2023-11-29 00:28:01,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3733940.0, ans=0.1 2023-11-29 00:28:09,837 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560100 2023-11-29 00:28:13,165 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7000, loss[loss=0.05763, simple_loss=0.07629, pruned_loss=0.0113, audio_tagging_loss=0.008184, over 14902.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08936, pruned_loss=0.0117, audio_tagging_loss=0.008482, over 3043166.21 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:28:20,576 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:28:38,959 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 8.937e+01 9.480e+01 1.049e+02 1.230e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-29 00:28:42,862 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:28:45,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3734140.0, ans=0.0 2023-11-29 00:28:50,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3734206.6666666665, ans=0.125 2023-11-29 00:29:10,528 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560150 2023-11-29 00:29:13,830 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7050, loss[loss=0.06137, simple_loss=0.08126, pruned_loss=0.01305, audio_tagging_loss=0.007697, over 14600.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08928, pruned_loss=0.01191, audio_tagging_loss=0.008548, over 3035957.41 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:29:21,367 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.98 vs. limit=15.0 2023-11-29 00:29:22,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3734340.0, ans=0.0 2023-11-29 00:29:40,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3734473.3333333335, ans=0.125 2023-11-29 00:29:44,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3734473.3333333335, ans=0.125 2023-11-29 00:29:54,085 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=15.0 2023-11-29 00:30:07,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3734606.6666666665, ans=0.0 2023-11-29 00:30:11,688 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560200 2023-11-29 00:30:16,172 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7100, loss[loss=0.05882, simple_loss=0.07362, pruned_loss=0.007246, audio_tagging_loss=0.01477, over 14326.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08982, pruned_loss=0.012, audio_tagging_loss=0.008642, over 3045721.05 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:30:27,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.03 vs. limit=10.0 2023-11-29 00:30:43,660 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.839e+01 8.863e+01 9.578e+01 1.032e+02 1.275e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-29 00:30:44,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3734806.6666666665, ans=0.0 2023-11-29 00:30:46,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3734806.6666666665, ans=0.0 2023-11-29 00:30:48,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3734806.6666666665, ans=0.0 2023-11-29 00:30:51,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3734873.3333333335, ans=0.125 2023-11-29 00:30:53,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3734873.3333333335, ans=0.125 2023-11-29 00:30:56,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3734873.3333333335, ans=0.0 2023-11-29 00:31:11,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3734940.0, ans=0.125 2023-11-29 00:31:14,679 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560250 2023-11-29 00:31:18,602 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7150, loss[loss=0.06127, simple_loss=0.08888, pruned_loss=0.00726, audio_tagging_loss=0.009573, over 14477.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08947, pruned_loss=0.01183, audio_tagging_loss=0.00871, over 3043373.76 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:32:02,540 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=8.06 vs. limit=8.0 2023-11-29 00:32:16,499 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560300 2023-11-29 00:32:19,878 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7200, loss[loss=0.08062, simple_loss=0.1138, pruned_loss=0.01679, audio_tagging_loss=0.006933, over 15250.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08952, pruned_loss=0.01185, audio_tagging_loss=0.008869, over 3043701.06 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:32:22,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3735340.0, ans=0.0 2023-11-29 00:32:35,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3735406.6666666665, ans=0.0 2023-11-29 00:32:47,173 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.965e+01 8.965e+01 9.449e+01 1.037e+02 1.518e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-29 00:33:10,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3735606.6666666665, ans=0.0 2023-11-29 00:33:17,239 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560350 2023-11-29 00:33:20,713 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7250, loss[loss=0.06335, simple_loss=0.09102, pruned_loss=0.0106, audio_tagging_loss=0.007237, over 16085.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.0898, pruned_loss=0.01198, audio_tagging_loss=0.008917, over 3047533.07 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:33:26,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3735673.3333333335, ans=0.09899494936611666 2023-11-29 00:33:34,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3735740.0, ans=0.125 2023-11-29 00:33:43,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3735740.0, ans=0.0 2023-11-29 00:33:50,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3735806.6666666665, ans=10.0 2023-11-29 00:33:59,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3735873.3333333335, ans=0.125 2023-11-29 00:34:01,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3735873.3333333335, ans=0.0 2023-11-29 00:34:19,901 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560400 2023-11-29 00:34:23,673 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7300, loss[loss=0.07363, simple_loss=0.09699, pruned_loss=0.01782, audio_tagging_loss=0.007318, over 15273.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08972, pruned_loss=0.01201, audio_tagging_loss=0.008782, over 3054976.01 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:34:32,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3736006.6666666665, ans=0.0 2023-11-29 00:34:37,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3736073.3333333335, ans=0.2 2023-11-29 00:34:37,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3736073.3333333335, ans=0.0 2023-11-29 00:34:42,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3736073.3333333335, ans=0.125 2023-11-29 00:34:43,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3736073.3333333335, ans=0.1 2023-11-29 00:34:45,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3736073.3333333335, ans=0.125 2023-11-29 00:34:46,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3736140.0, ans=10.0 2023-11-29 00:34:51,213 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.804e+01 9.526e+01 1.009e+02 1.275e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-29 00:35:14,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3736273.3333333335, ans=0.2 2023-11-29 00:35:21,799 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560450 2023-11-29 00:35:25,218 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7350, loss[loss=0.06587, simple_loss=0.08649, pruned_loss=0.01465, audio_tagging_loss=0.007978, over 15393.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09029, pruned_loss=0.0121, audio_tagging_loss=0.008657, over 3061339.31 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:35:30,721 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.62 vs. limit=15.0 2023-11-29 00:35:58,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3736473.3333333335, ans=0.125 2023-11-29 00:36:07,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3736540.0, ans=0.0 2023-11-29 00:36:18,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3736606.6666666665, ans=0.0 2023-11-29 00:36:23,170 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560500 2023-11-29 00:36:26,683 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7400, loss[loss=0.06999, simple_loss=0.107, pruned_loss=0.01179, audio_tagging_loss=0.004726, over 15990.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.0892, pruned_loss=0.0119, audio_tagging_loss=0.008687, over 3060996.67 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:36:56,337 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.891e+01 9.052e+01 9.894e+01 1.069e+02 1.258e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 00:37:19,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3736940.0, ans=0.0 2023-11-29 00:37:25,145 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560550 2023-11-29 00:37:25,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3736940.0, ans=0.0 2023-11-29 00:37:25,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3736940.0, ans=0.125 2023-11-29 00:37:29,092 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7450, loss[loss=0.05254, simple_loss=0.06389, pruned_loss=0.01008, audio_tagging_loss=0.01051, over 15800.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.09009, pruned_loss=0.01215, audio_tagging_loss=0.008543, over 3059134.60 frames. ], batch size: 64, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:37:38,311 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.90 vs. limit=15.0 2023-11-29 00:37:55,616 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:38:00,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3737140.0, ans=0.0 2023-11-29 00:38:22,406 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.45 vs. limit=15.0 2023-11-29 00:38:26,418 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560600 2023-11-29 00:38:30,329 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7500, loss[loss=0.07404, simple_loss=0.1118, pruned_loss=0.01207, audio_tagging_loss=0.006046, over 16034.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09052, pruned_loss=0.01224, audio_tagging_loss=0.008488, over 3061895.45 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:38:37,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3737340.0, ans=0.0 2023-11-29 00:38:47,790 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.73 vs. limit=10.0 2023-11-29 00:38:49,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3737406.6666666665, ans=0.125 2023-11-29 00:38:58,470 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.542e+01 8.888e+01 9.650e+01 1.039e+02 1.258e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 00:39:03,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=3737473.3333333335, ans=15.0 2023-11-29 00:39:19,629 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:39:29,004 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560650 2023-11-29 00:39:32,467 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7550, loss[loss=0.06465, simple_loss=0.08001, pruned_loss=0.01393, audio_tagging_loss=0.01073, over 14229.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.09014, pruned_loss=0.01231, audio_tagging_loss=0.00847, over 3063605.52 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:39:42,121 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.25 vs. limit=10.0 2023-11-29 00:39:43,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3737740.0, ans=0.0 2023-11-29 00:39:43,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3737740.0, ans=0.125 2023-11-29 00:39:44,358 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.37 vs. limit=10.0 2023-11-29 00:40:00,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3737806.6666666665, ans=0.125 2023-11-29 00:40:09,804 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2023-11-29 00:40:19,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3737873.3333333335, ans=0.125 2023-11-29 00:40:30,946 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560700 2023-11-29 00:40:34,510 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7600, loss[loss=0.05082, simple_loss=0.06922, pruned_loss=0.007632, audio_tagging_loss=0.00858, over 13949.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08904, pruned_loss=0.01211, audio_tagging_loss=0.008523, over 3046712.56 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:40:46,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3738006.6666666665, ans=0.0 2023-11-29 00:40:47,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3738073.3333333335, ans=0.0 2023-11-29 00:40:48,705 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.83 vs. limit=10.0 2023-11-29 00:40:58,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3738140.0, ans=22.5 2023-11-29 00:41:00,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3738140.0, ans=0.0 2023-11-29 00:41:03,019 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.868e+01 9.023e+01 9.623e+01 1.078e+02 1.517e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-29 00:41:04,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3738140.0, ans=0.1 2023-11-29 00:41:06,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3738140.0, ans=0.125 2023-11-29 00:41:06,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3738140.0, ans=0.125 2023-11-29 00:41:10,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3738206.6666666665, ans=0.0 2023-11-29 00:41:12,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3738206.6666666665, ans=0.125 2023-11-29 00:41:16,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3738206.6666666665, ans=0.125 2023-11-29 00:41:33,132 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560750 2023-11-29 00:41:37,290 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7650, loss[loss=0.0546, simple_loss=0.06678, pruned_loss=0.008616, audio_tagging_loss=0.01259, over 13437.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08882, pruned_loss=0.01213, audio_tagging_loss=0.008436, over 3038312.25 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:41:52,025 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.28 vs. limit=22.5 2023-11-29 00:41:55,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3738406.6666666665, ans=0.125 2023-11-29 00:41:58,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.40 vs. limit=12.0 2023-11-29 00:42:25,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3738606.6666666665, ans=0.125 2023-11-29 00:42:30,828 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2023-11-29 00:42:32,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3738606.6666666665, ans=0.0 2023-11-29 00:42:34,820 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560800 2023-11-29 00:42:34,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3738606.6666666665, ans=0.125 2023-11-29 00:42:38,565 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7700, loss[loss=0.07901, simple_loss=0.09937, pruned_loss=0.01616, audio_tagging_loss=0.01316, over 14825.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08866, pruned_loss=0.01207, audio_tagging_loss=0.008462, over 3039590.69 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:42:44,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3738673.3333333335, ans=0.1 2023-11-29 00:43:08,243 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.715e+01 9.118e+01 9.854e+01 1.042e+02 1.331e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-29 00:43:20,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3738873.3333333335, ans=0.2 2023-11-29 00:43:25,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3738873.3333333335, ans=0.125 2023-11-29 00:43:34,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3738940.0, ans=0.0 2023-11-29 00:43:36,760 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560850 2023-11-29 00:43:38,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3738940.0, ans=0.0 2023-11-29 00:43:40,126 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7750, loss[loss=0.05851, simple_loss=0.0878, pruned_loss=0.007129, audio_tagging_loss=0.007482, over 15284.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08943, pruned_loss=0.01217, audio_tagging_loss=0.008439, over 3043634.44 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:44:36,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3739273.3333333335, ans=0.2 2023-11-29 00:44:36,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3739273.3333333335, ans=0.125 2023-11-29 00:44:38,696 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560900 2023-11-29 00:44:42,128 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7800, loss[loss=0.06726, simple_loss=0.09695, pruned_loss=0.01333, audio_tagging_loss=0.005458, over 15024.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09011, pruned_loss=0.01231, audio_tagging_loss=0.008484, over 3040307.35 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:45:07,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3739473.3333333335, ans=0.1 2023-11-29 00:45:11,337 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.680e+01 8.840e+01 9.485e+01 1.019e+02 1.348e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-29 00:45:13,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3739473.3333333335, ans=0.0 2023-11-29 00:45:20,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3739540.0, ans=0.125 2023-11-29 00:45:32,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3739606.6666666665, ans=0.125 2023-11-29 00:45:41,176 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560950 2023-11-29 00:45:44,487 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7850, loss[loss=0.06034, simple_loss=0.07569, pruned_loss=0.01198, audio_tagging_loss=0.01052, over 14833.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08971, pruned_loss=0.01216, audio_tagging_loss=0.008626, over 3045008.90 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:45:44,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3739673.3333333335, ans=0.0 2023-11-29 00:45:55,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3739740.0, ans=0.125 2023-11-29 00:45:58,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3739740.0, ans=0.2 2023-11-29 00:46:05,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3739740.0, ans=0.2 2023-11-29 00:46:07,337 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.14 vs. limit=22.5 2023-11-29 00:46:30,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3739873.3333333335, ans=0.125 2023-11-29 00:46:38,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3739940.0, ans=0.125 2023-11-29 00:46:38,938 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.98 vs. limit=10.0 2023-11-29 00:46:41,933 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561000 2023-11-29 00:46:42,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3739940.0, ans=0.1 2023-11-29 00:46:46,337 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7900, loss[loss=0.06098, simple_loss=0.07797, pruned_loss=0.01175, audio_tagging_loss=0.01024, over 16529.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08891, pruned_loss=0.01196, audio_tagging_loss=0.008764, over 3050616.13 frames. ], batch size: 63, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:47:16,126 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 9.161e+01 9.815e+01 1.045e+02 1.564e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-29 00:47:25,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3740206.6666666665, ans=0.0 2023-11-29 00:47:33,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.40 vs. limit=15.0 2023-11-29 00:47:40,907 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:47:44,242 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561050 2023-11-29 00:47:48,108 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7950, loss[loss=0.07472, simple_loss=0.09809, pruned_loss=0.01628, audio_tagging_loss=0.009395, over 14652.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08908, pruned_loss=0.01195, audio_tagging_loss=0.008847, over 3047004.57 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:47:55,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3740340.0, ans=0.125 2023-11-29 00:48:05,068 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 00:48:13,939 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.78 vs. limit=15.0 2023-11-29 00:48:22,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3740473.3333333335, ans=0.0 2023-11-29 00:48:45,486 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561100 2023-11-29 00:48:48,865 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8000, loss[loss=0.0804, simple_loss=0.1137, pruned_loss=0.01584, audio_tagging_loss=0.007712, over 14949.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08898, pruned_loss=0.01193, audio_tagging_loss=0.008867, over 3043943.27 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:48:50,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3740673.3333333335, ans=0.0 2023-11-29 00:48:54,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3740673.3333333335, ans=0.1 2023-11-29 00:49:12,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=22.5 2023-11-29 00:49:18,852 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.404e+01 8.940e+01 9.510e+01 1.008e+02 1.278e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-29 00:49:28,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3740873.3333333335, ans=0.0 2023-11-29 00:49:35,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3740873.3333333335, ans=0.125 2023-11-29 00:49:46,792 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561150 2023-11-29 00:49:51,191 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8050, loss[loss=0.07988, simple_loss=0.1077, pruned_loss=0.01786, audio_tagging_loss=0.008173, over 14713.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08834, pruned_loss=0.01199, audio_tagging_loss=0.008974, over 3047136.00 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:49:52,936 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.46 vs. limit=15.0 2023-11-29 00:50:03,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3741073.3333333335, ans=0.125 2023-11-29 00:50:35,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3741206.6666666665, ans=0.125 2023-11-29 00:50:41,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3741273.3333333335, ans=0.125 2023-11-29 00:50:42,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3741273.3333333335, ans=0.125 2023-11-29 00:50:48,775 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561200 2023-11-29 00:50:52,567 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8100, loss[loss=0.07937, simple_loss=0.1123, pruned_loss=0.016, audio_tagging_loss=0.007243, over 15510.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08785, pruned_loss=0.01198, audio_tagging_loss=0.00889, over 3054966.00 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:51:05,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3741406.6666666665, ans=0.1 2023-11-29 00:51:05,418 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2023-11-29 00:51:15,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3741406.6666666665, ans=0.2 2023-11-29 00:51:17,743 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.53 vs. limit=6.0 2023-11-29 00:51:22,825 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.038e+01 9.545e+01 1.047e+02 1.336e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-29 00:51:50,163 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561250 2023-11-29 00:51:53,587 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8150, loss[loss=0.08764, simple_loss=0.1239, pruned_loss=0.02011, audio_tagging_loss=0.005579, over 16001.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08804, pruned_loss=0.01196, audio_tagging_loss=0.008704, over 3050396.92 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:52:12,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3741740.0, ans=0.125 2023-11-29 00:52:34,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3741873.3333333335, ans=0.125 2023-11-29 00:52:34,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3741873.3333333335, ans=0.05 2023-11-29 00:52:51,073 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561300 2023-11-29 00:52:55,175 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8200, loss[loss=0.05653, simple_loss=0.06883, pruned_loss=0.009287, audio_tagging_loss=0.01282, over 14276.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08947, pruned_loss=0.01197, audio_tagging_loss=0.008547, over 3060812.23 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:52:56,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.01 vs. limit=15.0 2023-11-29 00:52:58,167 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 00:53:21,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3742140.0, ans=0.125 2023-11-29 00:53:25,830 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.695e+01 9.120e+01 9.757e+01 1.054e+02 1.290e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-29 00:53:48,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3742273.3333333335, ans=0.1 2023-11-29 00:53:53,522 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561350 2023-11-29 00:53:57,515 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8250, loss[loss=0.06177, simple_loss=0.08459, pruned_loss=0.01161, audio_tagging_loss=0.007866, over 15168.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08963, pruned_loss=0.01194, audio_tagging_loss=0.008506, over 3059981.80 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:53:59,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3742340.0, ans=0.125 2023-11-29 00:54:11,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3742406.6666666665, ans=0.5 2023-11-29 00:54:55,607 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561400 2023-11-29 00:54:59,463 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8300, loss[loss=0.07203, simple_loss=0.09466, pruned_loss=0.01703, audio_tagging_loss=0.00767, over 15123.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08887, pruned_loss=0.01184, audio_tagging_loss=0.008537, over 3057405.39 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:55:19,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3742740.0, ans=0.125 2023-11-29 00:55:29,822 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.656e+01 9.057e+01 9.720e+01 1.032e+02 1.351e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-29 00:55:31,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3742806.6666666665, ans=0.0 2023-11-29 00:55:34,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3742873.3333333335, ans=0.0 2023-11-29 00:55:44,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3742873.3333333335, ans=0.025 2023-11-29 00:55:50,837 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.19 vs. limit=12.0 2023-11-29 00:55:56,179 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561450 2023-11-29 00:55:59,621 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8350, loss[loss=0.07187, simple_loss=0.1004, pruned_loss=0.01279, audio_tagging_loss=0.00889, over 15882.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08978, pruned_loss=0.01205, audio_tagging_loss=0.008481, over 3054560.34 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:56:20,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3743073.3333333335, ans=0.125 2023-11-29 00:56:22,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3743073.3333333335, ans=0.1 2023-11-29 00:56:43,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3743206.6666666665, ans=0.125 2023-11-29 00:56:57,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3743273.3333333335, ans=0.125 2023-11-29 00:56:58,011 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561500 2023-11-29 00:56:58,445 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2023-11-29 00:57:01,964 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8400, loss[loss=0.07, simple_loss=0.09006, pruned_loss=0.01617, audio_tagging_loss=0.008802, over 15099.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08995, pruned_loss=0.01222, audio_tagging_loss=0.008412, over 3056090.05 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:57:03,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3743340.0, ans=0.125 2023-11-29 00:57:22,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3743406.6666666665, ans=0.1 2023-11-29 00:57:31,748 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.851e+01 9.361e+01 9.943e+01 1.259e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-29 00:57:32,501 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.56 vs. limit=10.0 2023-11-29 00:57:33,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=15.0 2023-11-29 00:57:36,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3743540.0, ans=0.05 2023-11-29 00:57:42,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.74 vs. limit=22.5 2023-11-29 00:57:46,756 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=22.5 2023-11-29 00:57:47,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3743540.0, ans=0.0 2023-11-29 00:58:00,070 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561550 2023-11-29 00:58:00,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.26 vs. limit=22.5 2023-11-29 00:58:03,565 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8450, loss[loss=0.0679, simple_loss=0.08838, pruned_loss=0.01541, audio_tagging_loss=0.008304, over 14637.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09027, pruned_loss=0.01235, audio_tagging_loss=0.008389, over 3052304.33 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:58:24,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3743740.0, ans=0.125 2023-11-29 00:58:25,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3743740.0, ans=0.125 2023-11-29 00:58:48,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3743873.3333333335, ans=0.2 2023-11-29 00:59:01,294 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561600 2023-11-29 00:59:04,924 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8500, loss[loss=0.06832, simple_loss=0.09372, pruned_loss=0.01278, audio_tagging_loss=0.008681, over 15544.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09035, pruned_loss=0.01234, audio_tagging_loss=0.008423, over 3064751.99 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:59:06,732 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2023-11-29 00:59:38,010 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.884e+01 8.998e+01 9.683e+01 1.039e+02 1.237e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-29 00:59:49,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3744206.6666666665, ans=0.0 2023-11-29 01:00:03,114 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561650 2023-11-29 01:00:06,556 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8550, loss[loss=0.06506, simple_loss=0.09103, pruned_loss=0.01223, audio_tagging_loss=0.007315, over 14489.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.09016, pruned_loss=0.01225, audio_tagging_loss=0.008495, over 3064573.72 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:00:11,615 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.57 vs. limit=15.0 2023-11-29 01:00:21,731 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=15.0 2023-11-29 01:00:30,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3744473.3333333335, ans=0.0 2023-11-29 01:00:38,027 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.33 vs. limit=15.0 2023-11-29 01:01:05,827 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561700 2023-11-29 01:01:09,158 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8600, loss[loss=0.05901, simple_loss=0.07946, pruned_loss=0.008557, audio_tagging_loss=0.01072, over 15102.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09033, pruned_loss=0.01237, audio_tagging_loss=0.008542, over 3063319.61 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:01:26,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3744740.0, ans=0.0 2023-11-29 01:01:40,228 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.937e+01 9.624e+01 1.044e+02 4.545e+02, threshold=1.925e+02, percent-clipped=1.0 2023-11-29 01:01:55,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3744873.3333333335, ans=0.2 2023-11-29 01:02:06,553 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561750 2023-11-29 01:02:10,006 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8650, loss[loss=0.05463, simple_loss=0.06923, pruned_loss=0.01102, audio_tagging_loss=0.008998, over 13326.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08991, pruned_loss=0.01226, audio_tagging_loss=0.008637, over 3058962.82 frames. ], batch size: 52, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:02:39,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3745140.0, ans=0.125 2023-11-29 01:03:02,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3745273.3333333335, ans=0.0 2023-11-29 01:03:07,249 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561800 2023-11-29 01:03:07,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3745273.3333333335, ans=0.2 2023-11-29 01:03:08,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3745273.3333333335, ans=0.07 2023-11-29 01:03:11,061 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8700, loss[loss=0.0637, simple_loss=0.08603, pruned_loss=0.009853, audio_tagging_loss=0.01083, over 15478.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.0902, pruned_loss=0.0123, audio_tagging_loss=0.008658, over 3054974.18 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:03:32,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3745406.6666666665, ans=0.125 2023-11-29 01:03:33,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3745406.6666666665, ans=0.1 2023-11-29 01:03:36,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3745473.3333333335, ans=0.0 2023-11-29 01:03:44,727 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 8.978e+01 9.587e+01 1.045e+02 1.358e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 01:03:45,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3745473.3333333335, ans=0.2 2023-11-29 01:04:10,041 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561850 2023-11-29 01:04:14,700 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8750, loss[loss=0.06735, simple_loss=0.08738, pruned_loss=0.01333, audio_tagging_loss=0.01033, over 15940.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09079, pruned_loss=0.01227, audio_tagging_loss=0.008638, over 3054764.70 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 8.0 2023-11-29 01:04:22,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3745673.3333333335, ans=0.125 2023-11-29 01:04:29,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3745740.0, ans=0.2 2023-11-29 01:04:39,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3745806.6666666665, ans=0.125 2023-11-29 01:04:42,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3745806.6666666665, ans=0.125 2023-11-29 01:04:49,460 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2023-11-29 01:04:50,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3745873.3333333335, ans=0.0 2023-11-29 01:04:57,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.89 vs. limit=10.0 2023-11-29 01:05:11,968 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561900 2023-11-29 01:05:15,321 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8800, loss[loss=0.06454, simple_loss=0.09286, pruned_loss=0.009618, audio_tagging_loss=0.008489, over 15920.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09131, pruned_loss=0.01231, audio_tagging_loss=0.008733, over 3049462.17 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:05:36,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3746073.3333333335, ans=0.125 2023-11-29 01:05:42,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3746140.0, ans=0.125 2023-11-29 01:05:46,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3746140.0, ans=0.0 2023-11-29 01:05:49,573 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.820e+01 9.250e+01 9.769e+01 1.064e+02 1.773e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 01:05:49,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3746140.0, ans=0.125 2023-11-29 01:05:53,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3746206.6666666665, ans=0.125 2023-11-29 01:06:12,870 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561950 2023-11-29 01:06:16,801 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8850, loss[loss=0.05985, simple_loss=0.07684, pruned_loss=0.0121, audio_tagging_loss=0.009331, over 14921.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09078, pruned_loss=0.01226, audio_tagging_loss=0.008797, over 3048603.57 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:06:30,559 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:06:32,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=22.5 2023-11-29 01:06:41,638 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.76 vs. limit=12.0 2023-11-29 01:06:44,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3746473.3333333335, ans=0.0 2023-11-29 01:07:05,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3746606.6666666665, ans=0.125 2023-11-29 01:07:14,959 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562000 2023-11-29 01:07:15,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3746606.6666666665, ans=0.0 2023-11-29 01:07:19,284 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8900, loss[loss=0.06282, simple_loss=0.08757, pruned_loss=0.01104, audio_tagging_loss=0.007997, over 14906.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09139, pruned_loss=0.01225, audio_tagging_loss=0.008768, over 3049808.28 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:07:22,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3746673.3333333335, ans=0.0 2023-11-29 01:07:26,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3746673.3333333335, ans=0.0 2023-11-29 01:07:35,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3746740.0, ans=0.125 2023-11-29 01:07:42,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3746806.6666666665, ans=0.125 2023-11-29 01:07:52,109 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.613e+01 8.904e+01 9.438e+01 1.016e+02 1.537e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-29 01:07:57,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=15.0 2023-11-29 01:08:07,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3746940.0, ans=0.125 2023-11-29 01:08:17,344 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562050 2023-11-29 01:08:20,662 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8950, loss[loss=0.08298, simple_loss=0.1185, pruned_loss=0.01451, audio_tagging_loss=0.009198, over 15728.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09146, pruned_loss=0.01229, audio_tagging_loss=0.008624, over 3046449.30 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:08:40,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3747073.3333333335, ans=0.125 2023-11-29 01:08:49,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3747140.0, ans=0.2 2023-11-29 01:09:17,845 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562100 2023-11-29 01:09:19,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3747273.3333333335, ans=0.125 2023-11-29 01:09:21,830 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9000, loss[loss=0.05662, simple_loss=0.07676, pruned_loss=0.009672, audio_tagging_loss=0.008568, over 14507.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09043, pruned_loss=0.01217, audio_tagging_loss=0.008605, over 3048938.13 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:09:21,832 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-29 01:09:47,664 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.3917, 2.9346, 3.2153, 2.9421, 3.6614, 3.7476, 3.2794, 3.2795], device='cuda:0') 2023-11-29 01:10:02,101 INFO [train_asr.py:1267] (0/4) Epoch 47, validation: loss=0.05855, simple_loss=0.05046, pruned_loss=0.005347, audio_tagging_loss=0.02798, over 4681554.00 frames. 2023-11-29 01:10:02,102 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-29 01:10:07,420 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.58 vs. limit=15.0 2023-11-29 01:10:34,516 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.906e+01 9.620e+01 1.037e+02 1.250e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 01:10:53,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3747606.6666666665, ans=0.125 2023-11-29 01:10:54,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3747606.6666666665, ans=0.125 2023-11-29 01:10:59,355 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562150 2023-11-29 01:11:03,480 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9050, loss[loss=0.0706, simple_loss=0.09889, pruned_loss=0.01422, audio_tagging_loss=0.006938, over 15285.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.0918, pruned_loss=0.01231, audio_tagging_loss=0.008458, over 3049489.95 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:11:22,329 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.54 vs. limit=15.0 2023-11-29 01:11:33,368 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-11-29 01:11:57,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3747940.0, ans=0.0 2023-11-29 01:12:01,694 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562200 2023-11-29 01:12:03,795 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.22 vs. limit=22.5 2023-11-29 01:12:05,342 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9100, loss[loss=0.06245, simple_loss=0.08552, pruned_loss=0.01295, audio_tagging_loss=0.006736, over 16501.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.09054, pruned_loss=0.01211, audio_tagging_loss=0.008441, over 3050204.90 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:12:23,151 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2023-11-29 01:12:30,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3748140.0, ans=0.2 2023-11-29 01:12:34,991 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.59 vs. limit=22.5 2023-11-29 01:12:38,337 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.957e+01 8.933e+01 9.563e+01 1.020e+02 1.667e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 01:12:42,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3748206.6666666665, ans=0.5 2023-11-29 01:12:44,507 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:13:02,993 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562250 2023-11-29 01:13:06,526 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9150, loss[loss=0.08098, simple_loss=0.104, pruned_loss=0.01616, audio_tagging_loss=0.01281, over 15118.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09052, pruned_loss=0.01228, audio_tagging_loss=0.008479, over 3049681.02 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:13:12,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3748340.0, ans=0.125 2023-11-29 01:13:30,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3748473.3333333335, ans=0.125 2023-11-29 01:13:37,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3748473.3333333335, ans=0.125 2023-11-29 01:13:38,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3748473.3333333335, ans=0.125 2023-11-29 01:13:45,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3748540.0, ans=0.125 2023-11-29 01:13:47,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3748540.0, ans=0.125 2023-11-29 01:13:52,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3748540.0, ans=10.0 2023-11-29 01:14:04,749 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562300 2023-11-29 01:14:08,053 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9200, loss[loss=0.07562, simple_loss=0.1067, pruned_loss=0.01501, audio_tagging_loss=0.007273, over 16183.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09109, pruned_loss=0.0124, audio_tagging_loss=0.008457, over 3048616.30 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:14:17,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3748673.3333333335, ans=0.125 2023-11-29 01:14:41,283 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 9.164e+01 9.710e+01 1.033e+02 1.295e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-29 01:15:06,375 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562350 2023-11-29 01:15:10,332 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9250, loss[loss=0.07291, simple_loss=0.1014, pruned_loss=0.01414, audio_tagging_loss=0.008072, over 15243.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09025, pruned_loss=0.01226, audio_tagging_loss=0.00845, over 3053648.32 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:15:21,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3749073.3333333335, ans=0.1 2023-11-29 01:15:45,733 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2023-11-29 01:15:45,826 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.54 vs. limit=10.0 2023-11-29 01:16:07,660 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562400 2023-11-29 01:16:11,611 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9300, loss[loss=0.07204, simple_loss=0.09852, pruned_loss=0.01274, audio_tagging_loss=0.01004, over 14789.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08987, pruned_loss=0.01207, audio_tagging_loss=0.00854, over 3053471.27 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:16:18,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.12 vs. limit=22.5 2023-11-29 01:16:45,406 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.732e+01 9.046e+01 9.645e+01 1.038e+02 1.624e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 01:17:02,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2023-11-29 01:17:09,978 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562450 2023-11-29 01:17:13,328 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9350, loss[loss=0.05411, simple_loss=0.07406, pruned_loss=0.009579, audio_tagging_loss=0.007497, over 16708.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08993, pruned_loss=0.01214, audio_tagging_loss=0.008547, over 3054112.48 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:17:18,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3749673.3333333335, ans=0.5 2023-11-29 01:17:31,576 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=12.0 2023-11-29 01:17:53,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3749873.3333333335, ans=0.0 2023-11-29 01:17:56,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3749873.3333333335, ans=0.0 2023-11-29 01:18:10,402 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562500 2023-11-29 01:18:15,244 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9400, loss[loss=0.07261, simple_loss=0.09664, pruned_loss=0.017, audio_tagging_loss=0.007283, over 15362.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08936, pruned_loss=0.01216, audio_tagging_loss=0.008594, over 3050174.95 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:18:15,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3750006.6666666665, ans=0.05 2023-11-29 01:18:48,145 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.900e+01 9.214e+01 9.788e+01 1.040e+02 1.202e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-29 01:18:53,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3750206.6666666665, ans=0.125 2023-11-29 01:19:12,630 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562550 2023-11-29 01:19:12,782 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:19:14,415 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2023-11-29 01:19:16,073 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9450, loss[loss=0.07477, simple_loss=0.1004, pruned_loss=0.01782, audio_tagging_loss=0.006741, over 14342.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08904, pruned_loss=0.01205, audio_tagging_loss=0.008714, over 3051022.49 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:19:17,786 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:19:32,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3750406.6666666665, ans=0.0 2023-11-29 01:19:47,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3750473.3333333335, ans=0.0 2023-11-29 01:19:55,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3750540.0, ans=0.1 2023-11-29 01:20:15,322 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562600 2023-11-29 01:20:18,986 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9500, loss[loss=0.07721, simple_loss=0.099, pruned_loss=0.01978, audio_tagging_loss=0.007933, over 15289.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08913, pruned_loss=0.0121, audio_tagging_loss=0.008744, over 3045495.33 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:20:34,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3750740.0, ans=0.125 2023-11-29 01:20:37,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3750740.0, ans=0.0 2023-11-29 01:20:40,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3750740.0, ans=0.125 2023-11-29 01:20:47,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.99 vs. limit=15.0 2023-11-29 01:20:53,740 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.404e+01 8.884e+01 9.617e+01 1.043e+02 1.271e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-29 01:20:54,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3750806.6666666665, ans=0.125 2023-11-29 01:20:55,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3750873.3333333335, ans=0.1 2023-11-29 01:20:59,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3750873.3333333335, ans=0.125 2023-11-29 01:21:10,688 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=15.0 2023-11-29 01:21:17,138 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562650 2023-11-29 01:21:20,500 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9550, loss[loss=0.05357, simple_loss=0.07446, pruned_loss=0.006798, audio_tagging_loss=0.009542, over 14900.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08898, pruned_loss=0.01205, audio_tagging_loss=0.00879, over 3045965.76 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:21:50,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3751140.0, ans=0.125 2023-11-29 01:21:56,285 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.12 vs. limit=15.0 2023-11-29 01:22:19,416 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562700 2023-11-29 01:22:22,941 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9600, loss[loss=0.07063, simple_loss=0.09442, pruned_loss=0.01337, audio_tagging_loss=0.01005, over 15504.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08866, pruned_loss=0.01205, audio_tagging_loss=0.00887, over 3053108.25 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:22:29,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3751340.0, ans=0.1 2023-11-29 01:22:54,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3751473.3333333335, ans=0.0 2023-11-29 01:22:57,412 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.720e+01 8.993e+01 9.667e+01 1.037e+02 1.328e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-29 01:22:58,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3751540.0, ans=0.125 2023-11-29 01:23:12,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3751606.6666666665, ans=0.125 2023-11-29 01:23:14,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.64 vs. limit=22.5 2023-11-29 01:23:15,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3751606.6666666665, ans=0.125 2023-11-29 01:23:21,972 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562750 2023-11-29 01:23:25,377 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9650, loss[loss=0.05201, simple_loss=0.07044, pruned_loss=0.009073, audio_tagging_loss=0.007712, over 14664.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08887, pruned_loss=0.01203, audio_tagging_loss=0.00891, over 3047909.38 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:23:25,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3751673.3333333335, ans=0.0 2023-11-29 01:23:27,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3751673.3333333335, ans=0.1 2023-11-29 01:23:31,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3751673.3333333335, ans=0.125 2023-11-29 01:24:02,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3751873.3333333335, ans=0.125 2023-11-29 01:24:21,973 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:24:23,012 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562800 2023-11-29 01:24:23,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3751940.0, ans=0.1 2023-11-29 01:24:23,545 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.43 vs. limit=15.0 2023-11-29 01:24:24,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3751940.0, ans=0.125 2023-11-29 01:24:26,724 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9700, loss[loss=0.06001, simple_loss=0.08647, pruned_loss=0.01008, audio_tagging_loss=0.006702, over 15489.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08919, pruned_loss=0.01189, audio_tagging_loss=0.00866, over 3046207.83 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:24:26,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3752006.6666666665, ans=0.0 2023-11-29 01:24:52,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3752140.0, ans=0.125 2023-11-29 01:25:01,799 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.640e+01 9.050e+01 9.542e+01 1.032e+02 1.533e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-29 01:25:03,615 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.45 vs. limit=15.0 2023-11-29 01:25:22,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3752273.3333333335, ans=0.125 2023-11-29 01:25:24,959 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562850 2023-11-29 01:25:28,341 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9750, loss[loss=0.07032, simple_loss=0.09439, pruned_loss=0.01399, audio_tagging_loss=0.009129, over 15186.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08972, pruned_loss=0.01191, audio_tagging_loss=0.008495, over 3041029.06 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:25:36,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3752340.0, ans=0.0 2023-11-29 01:25:42,891 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.91 vs. limit=22.5 2023-11-29 01:26:19,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3752606.6666666665, ans=0.0 2023-11-29 01:26:28,032 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562900 2023-11-29 01:26:31,450 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9800, loss[loss=0.07652, simple_loss=0.1077, pruned_loss=0.01429, audio_tagging_loss=0.008366, over 15925.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.09033, pruned_loss=0.01202, audio_tagging_loss=0.008443, over 3043287.64 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:26:36,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3752673.3333333335, ans=0.125 2023-11-29 01:26:40,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3752673.3333333335, ans=0.0 2023-11-29 01:26:55,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3752806.6666666665, ans=0.125 2023-11-29 01:27:04,747 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 8.999e+01 9.540e+01 1.035e+02 1.290e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-29 01:27:12,688 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.31 vs. limit=10.0 2023-11-29 01:27:27,937 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:27:28,007 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562950 2023-11-29 01:27:29,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3752940.0, ans=0.0 2023-11-29 01:27:31,229 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9850, loss[loss=0.06372, simple_loss=0.09732, pruned_loss=0.007529, audio_tagging_loss=0.007535, over 15517.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.09044, pruned_loss=0.01196, audio_tagging_loss=0.008468, over 3043114.43 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:27:56,463 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:27:56,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3753140.0, ans=0.125 2023-11-29 01:28:05,986 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.87 vs. limit=15.0 2023-11-29 01:28:06,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3753140.0, ans=0.0 2023-11-29 01:28:20,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3753273.3333333335, ans=0.125 2023-11-29 01:28:28,998 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2023-11-29 01:28:29,706 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563000 2023-11-29 01:28:33,630 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9900, loss[loss=0.06197, simple_loss=0.0868, pruned_loss=0.0103, audio_tagging_loss=0.008264, over 15227.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.0903, pruned_loss=0.01213, audio_tagging_loss=0.008472, over 3042669.13 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:28:54,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3753406.6666666665, ans=0.125 2023-11-29 01:29:09,359 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 8.970e+01 9.713e+01 1.037e+02 1.358e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-29 01:29:31,834 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563050 2023-11-29 01:29:35,975 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9950, loss[loss=0.06902, simple_loss=0.1051, pruned_loss=0.01076, audio_tagging_loss=0.005712, over 15356.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09061, pruned_loss=0.01218, audio_tagging_loss=0.008423, over 3045366.27 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:29:52,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3753740.0, ans=10.0 2023-11-29 01:30:05,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3753806.6666666665, ans=0.125 2023-11-29 01:30:13,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3753873.3333333335, ans=0.125 2023-11-29 01:30:33,912 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563100 2023-11-29 01:30:37,325 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10000, loss[loss=0.07841, simple_loss=0.1034, pruned_loss=0.02041, audio_tagging_loss=0.006306, over 14852.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09113, pruned_loss=0.01223, audio_tagging_loss=0.008387, over 3042085.31 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:30:58,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3754073.3333333335, ans=0.0 2023-11-29 01:30:59,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3754073.3333333335, ans=0.125 2023-11-29 01:31:05,464 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.38 vs. limit=15.0 2023-11-29 01:31:07,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3754140.0, ans=0.125 2023-11-29 01:31:07,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3754140.0, ans=0.125 2023-11-29 01:31:13,954 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 9.037e+01 9.619e+01 1.035e+02 1.339e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 01:31:16,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3754206.6666666665, ans=0.1 2023-11-29 01:31:19,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3754206.6666666665, ans=0.125 2023-11-29 01:31:19,367 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.95 vs. limit=15.0 2023-11-29 01:31:26,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3754273.3333333335, ans=0.125 2023-11-29 01:31:26,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3754273.3333333335, ans=22.5 2023-11-29 01:31:28,798 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2023-11-29 01:31:31,253 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.72 vs. limit=15.0 2023-11-29 01:31:35,294 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563150 2023-11-29 01:31:39,210 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10050, loss[loss=0.0453, simple_loss=0.05685, pruned_loss=0.007082, audio_tagging_loss=0.009791, over 15795.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.09015, pruned_loss=0.01203, audio_tagging_loss=0.00841, over 3048004.03 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:32:04,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3754473.3333333335, ans=0.025 2023-11-29 01:32:09,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3754473.3333333335, ans=0.125 2023-11-29 01:32:10,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3754473.3333333335, ans=0.0 2023-11-29 01:32:24,381 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.32 vs. limit=15.0 2023-11-29 01:32:31,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3754606.6666666665, ans=0.125 2023-11-29 01:32:34,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3754606.6666666665, ans=0.125 2023-11-29 01:32:37,306 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563200 2023-11-29 01:32:41,636 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10100, loss[loss=0.06449, simple_loss=0.09232, pruned_loss=0.009597, audio_tagging_loss=0.008738, over 15900.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09051, pruned_loss=0.01203, audio_tagging_loss=0.008409, over 3056273.00 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:32:54,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3754740.0, ans=0.2 2023-11-29 01:32:57,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.34 vs. limit=15.0 2023-11-29 01:33:11,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3754806.6666666665, ans=15.0 2023-11-29 01:33:17,888 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 9.162e+01 9.791e+01 1.075e+02 1.682e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-29 01:33:25,012 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2023-11-29 01:33:33,319 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:33:36,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3754940.0, ans=0.2 2023-11-29 01:33:39,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3754940.0, ans=0.0 2023-11-29 01:33:39,943 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563250 2023-11-29 01:33:43,348 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10150, loss[loss=0.06444, simple_loss=0.08928, pruned_loss=0.01086, audio_tagging_loss=0.008949, over 14824.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09085, pruned_loss=0.01202, audio_tagging_loss=0.008567, over 3056345.78 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:33:51,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3755006.6666666665, ans=0.125 2023-11-29 01:33:51,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3755006.6666666665, ans=0.125 2023-11-29 01:33:55,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3755073.3333333335, ans=0.5 2023-11-29 01:34:00,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3755073.3333333335, ans=0.0 2023-11-29 01:34:06,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3755140.0, ans=0.07 2023-11-29 01:34:13,497 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:34:16,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3755140.0, ans=0.125 2023-11-29 01:34:30,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3755206.6666666665, ans=0.04949747468305833 2023-11-29 01:34:38,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3755273.3333333335, ans=0.0 2023-11-29 01:34:40,734 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563300 2023-11-29 01:34:44,802 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10200, loss[loss=0.05223, simple_loss=0.06789, pruned_loss=0.01002, audio_tagging_loss=0.008269, over 15284.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09159, pruned_loss=0.01234, audio_tagging_loss=0.008552, over 3057207.40 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:34:50,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3755340.0, ans=0.1 2023-11-29 01:34:53,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3755340.0, ans=10.0 2023-11-29 01:35:00,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3755406.6666666665, ans=0.125 2023-11-29 01:35:09,598 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:35:14,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3755473.3333333335, ans=0.125 2023-11-29 01:35:19,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3755473.3333333335, ans=0.125 2023-11-29 01:35:21,764 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.706e+01 9.157e+01 9.642e+01 1.029e+02 1.501e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-29 01:35:36,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3755606.6666666665, ans=0.125 2023-11-29 01:35:38,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.75 vs. limit=10.0 2023-11-29 01:35:42,286 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563350 2023-11-29 01:35:46,282 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10250, loss[loss=0.07176, simple_loss=0.1019, pruned_loss=0.01196, audio_tagging_loss=0.008862, over 16012.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09086, pruned_loss=0.01222, audio_tagging_loss=0.008584, over 3059800.35 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:36:39,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3755940.0, ans=0.125 2023-11-29 01:36:43,869 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563400 2023-11-29 01:36:47,546 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10300, loss[loss=0.07958, simple_loss=0.109, pruned_loss=0.01693, audio_tagging_loss=0.008169, over 14546.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08888, pruned_loss=0.0119, audio_tagging_loss=0.008814, over 3057770.84 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:36:59,422 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.85 vs. limit=15.0 2023-11-29 01:37:02,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3756073.3333333335, ans=0.125 2023-11-29 01:37:02,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.99 vs. limit=15.0 2023-11-29 01:37:04,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3756073.3333333335, ans=0.125 2023-11-29 01:37:05,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3756073.3333333335, ans=0.0 2023-11-29 01:37:25,111 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 9.072e+01 9.694e+01 1.048e+02 1.558e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-29 01:37:36,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3756273.3333333335, ans=0.2 2023-11-29 01:37:38,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3756273.3333333335, ans=0.0 2023-11-29 01:37:41,039 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.75 vs. limit=12.0 2023-11-29 01:37:46,230 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563450 2023-11-29 01:37:46,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3756273.3333333335, ans=0.125 2023-11-29 01:37:49,664 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10350, loss[loss=0.05708, simple_loss=0.07561, pruned_loss=0.00912, audio_tagging_loss=0.01016, over 15929.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08823, pruned_loss=0.01196, audio_tagging_loss=0.008901, over 3048519.21 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:37:51,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3756340.0, ans=0.125 2023-11-29 01:37:56,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3756340.0, ans=0.1 2023-11-29 01:38:00,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3756340.0, ans=0.125 2023-11-29 01:38:06,406 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:38:10,872 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=22.5 2023-11-29 01:38:11,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3756406.6666666665, ans=0.09899494936611666 2023-11-29 01:38:21,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.48 vs. limit=6.0 2023-11-29 01:38:31,176 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:38:41,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3756606.6666666665, ans=0.125 2023-11-29 01:38:45,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3756606.6666666665, ans=0.0 2023-11-29 01:38:47,886 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563500 2023-11-29 01:38:51,262 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10400, loss[loss=0.04677, simple_loss=0.06571, pruned_loss=0.006498, audio_tagging_loss=0.00742, over 13892.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08739, pruned_loss=0.01174, audio_tagging_loss=0.008943, over 3050077.68 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:38:51,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3756673.3333333335, ans=0.125 2023-11-29 01:38:51,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3756673.3333333335, ans=10.0 2023-11-29 01:39:05,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3756740.0, ans=0.125 2023-11-29 01:39:18,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3756806.6666666665, ans=0.125 2023-11-29 01:39:26,626 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.12 vs. limit=12.0 2023-11-29 01:39:28,506 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.657e+01 9.225e+01 9.627e+01 1.037e+02 1.431e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-29 01:39:37,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3756873.3333333335, ans=0.125 2023-11-29 01:39:49,680 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563550 2023-11-29 01:39:50,301 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2023-11-29 01:39:53,070 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10450, loss[loss=0.07851, simple_loss=0.1058, pruned_loss=0.01598, audio_tagging_loss=0.009622, over 17012.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08766, pruned_loss=0.01179, audio_tagging_loss=0.008953, over 3045006.53 frames. ], batch size: 65, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:40:04,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3757073.3333333335, ans=0.1 2023-11-29 01:40:05,763 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.94 vs. limit=6.0 2023-11-29 01:40:26,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3757140.0, ans=0.125 2023-11-29 01:40:43,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3757273.3333333335, ans=0.2 2023-11-29 01:40:50,099 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563600 2023-11-29 01:40:51,406 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.23 vs. limit=22.5 2023-11-29 01:40:54,488 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10500, loss[loss=0.06297, simple_loss=0.08583, pruned_loss=0.01225, audio_tagging_loss=0.007797, over 15260.00 frames. ], tot_loss[loss=0.06419, simple_loss=0.0875, pruned_loss=0.01165, audio_tagging_loss=0.008787, over 3041618.40 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:40:58,827 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.20 vs. limit=12.0 2023-11-29 01:41:08,721 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2023-11-29 01:41:14,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3757406.6666666665, ans=0.035 2023-11-29 01:41:14,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3757406.6666666665, ans=0.1 2023-11-29 01:41:22,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3757473.3333333335, ans=0.1 2023-11-29 01:41:31,333 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.549e+01 8.902e+01 9.602e+01 1.050e+02 1.360e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 01:41:52,574 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563650 2023-11-29 01:41:55,913 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10550, loss[loss=0.05297, simple_loss=0.06593, pruned_loss=0.01096, audio_tagging_loss=0.009048, over 15051.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08817, pruned_loss=0.01177, audio_tagging_loss=0.008655, over 3043612.26 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:41:57,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3757673.3333333335, ans=0.0 2023-11-29 01:42:13,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3757740.0, ans=0.1 2023-11-29 01:42:21,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.77 vs. limit=10.0 2023-11-29 01:42:46,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3757940.0, ans=0.0 2023-11-29 01:42:49,641 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.52 vs. limit=15.0 2023-11-29 01:42:54,314 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563700 2023-11-29 01:42:57,052 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.33 vs. limit=10.0 2023-11-29 01:42:57,612 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10600, loss[loss=0.07229, simple_loss=0.1031, pruned_loss=0.01391, audio_tagging_loss=0.006814, over 15667.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08898, pruned_loss=0.012, audio_tagging_loss=0.008569, over 3050574.13 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:43:33,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.80 vs. limit=15.0 2023-11-29 01:43:34,211 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.594e+01 9.127e+01 9.716e+01 1.043e+02 1.257e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-29 01:43:54,630 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563750 2023-11-29 01:43:58,045 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10650, loss[loss=0.06127, simple_loss=0.07862, pruned_loss=0.009127, audio_tagging_loss=0.01284, over 16114.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08936, pruned_loss=0.01209, audio_tagging_loss=0.008591, over 3046823.28 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:44:05,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3758340.0, ans=0.0 2023-11-29 01:44:12,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3758406.6666666665, ans=0.0 2023-11-29 01:44:56,331 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563800 2023-11-29 01:45:00,050 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10700, loss[loss=0.05133, simple_loss=0.06844, pruned_loss=0.006952, audio_tagging_loss=0.01016, over 15594.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.0888, pruned_loss=0.01178, audio_tagging_loss=0.00854, over 3054805.47 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:45:03,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3758673.3333333335, ans=0.07 2023-11-29 01:45:03,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3758673.3333333335, ans=0.125 2023-11-29 01:45:10,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3758673.3333333335, ans=0.0 2023-11-29 01:45:13,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3758740.0, ans=0.125 2023-11-29 01:45:26,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3758806.6666666665, ans=0.0 2023-11-29 01:45:26,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3758806.6666666665, ans=0.125 2023-11-29 01:45:37,222 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.610e+01 9.369e+01 1.025e+02 1.277e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-29 01:45:37,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3758873.3333333335, ans=0.1 2023-11-29 01:45:57,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3758940.0, ans=0.1 2023-11-29 01:45:58,447 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563850 2023-11-29 01:46:01,901 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10750, loss[loss=0.05952, simple_loss=0.08141, pruned_loss=0.009244, audio_tagging_loss=0.009577, over 15842.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08919, pruned_loss=0.01183, audio_tagging_loss=0.008538, over 3055597.99 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:46:17,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3759073.3333333335, ans=0.125 2023-11-29 01:46:29,271 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.06 vs. limit=12.0 2023-11-29 01:46:35,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3759140.0, ans=0.125 2023-11-29 01:46:35,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3759140.0, ans=0.2 2023-11-29 01:46:41,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3759206.6666666665, ans=0.125 2023-11-29 01:46:54,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3759273.3333333335, ans=0.125 2023-11-29 01:46:59,055 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563900 2023-11-29 01:47:02,485 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10800, loss[loss=0.05994, simple_loss=0.08215, pruned_loss=0.00754, audio_tagging_loss=0.01133, over 14964.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08948, pruned_loss=0.01196, audio_tagging_loss=0.008552, over 3054076.20 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:47:16,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3759406.6666666665, ans=0.2 2023-11-29 01:47:41,321 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.657e+01 9.091e+01 9.540e+01 1.017e+02 1.841e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-29 01:47:56,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3759606.6666666665, ans=0.1 2023-11-29 01:48:00,156 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563950 2023-11-29 01:48:04,381 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10850, loss[loss=0.0698, simple_loss=0.1052, pruned_loss=0.01061, audio_tagging_loss=0.006582, over 15890.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08953, pruned_loss=0.01194, audio_tagging_loss=0.008456, over 3054990.33 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:48:18,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3759740.0, ans=0.1 2023-11-29 01:48:27,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3759740.0, ans=0.0 2023-11-29 01:48:37,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3759806.6666666665, ans=0.125 2023-11-29 01:48:39,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2023-11-29 01:48:44,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3759873.3333333335, ans=0.1 2023-11-29 01:48:53,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3759940.0, ans=0.0 2023-11-29 01:48:56,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3759940.0, ans=0.125 2023-11-29 01:48:59,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3759940.0, ans=0.125 2023-11-29 01:49:03,690 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564000 2023-11-29 01:49:05,767 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-564000.pt 2023-11-29 01:49:09,379 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:49:10,548 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10900, loss[loss=0.0665, simple_loss=0.09093, pruned_loss=0.01147, audio_tagging_loss=0.009563, over 15269.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08985, pruned_loss=0.0121, audio_tagging_loss=0.008472, over 3057056.56 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:49:15,923 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2023-11-29 01:49:30,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3760073.3333333335, ans=0.0 2023-11-29 01:49:30,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3760073.3333333335, ans=0.07 2023-11-29 01:49:33,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3760140.0, ans=0.1 2023-11-29 01:49:40,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3760140.0, ans=0.125 2023-11-29 01:49:48,902 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.916e+01 9.098e+01 9.720e+01 1.048e+02 1.470e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-29 01:49:52,499 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.78 vs. limit=15.0 2023-11-29 01:49:53,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2023-11-29 01:50:04,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3760273.3333333335, ans=0.0 2023-11-29 01:50:07,926 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564050 2023-11-29 01:50:11,369 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10950, loss[loss=0.06508, simple_loss=0.09363, pruned_loss=0.01012, audio_tagging_loss=0.008143, over 15378.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08916, pruned_loss=0.01192, audio_tagging_loss=0.008552, over 3056931.05 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:50:11,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3760340.0, ans=0.1 2023-11-29 01:50:17,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3760340.0, ans=0.015 2023-11-29 01:50:19,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2023-11-29 01:51:06,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3760606.6666666665, ans=0.125 2023-11-29 01:51:08,329 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564100 2023-11-29 01:51:11,630 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.11 vs. limit=22.5 2023-11-29 01:51:12,321 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11000, loss[loss=0.06056, simple_loss=0.08193, pruned_loss=0.01112, audio_tagging_loss=0.008473, over 16403.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08867, pruned_loss=0.012, audio_tagging_loss=0.008612, over 3058562.23 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:51:18,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3760673.3333333335, ans=0.125 2023-11-29 01:51:23,507 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:51:29,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3760740.0, ans=0.125 2023-11-29 01:51:37,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3760806.6666666665, ans=0.2 2023-11-29 01:51:49,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3760873.3333333335, ans=0.05 2023-11-29 01:51:51,422 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.918e+01 9.080e+01 9.821e+01 1.045e+02 1.365e+02, threshold=1.964e+02, percent-clipped=0.0 2023-11-29 01:51:58,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3760873.3333333335, ans=0.0 2023-11-29 01:52:06,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3760940.0, ans=0.125 2023-11-29 01:52:10,033 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564150 2023-11-29 01:52:14,034 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11050, loss[loss=0.05857, simple_loss=0.07336, pruned_loss=0.01325, audio_tagging_loss=0.008635, over 14433.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08907, pruned_loss=0.01192, audio_tagging_loss=0.008679, over 3060668.73 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:52:48,647 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.27 vs. limit=15.0 2023-11-29 01:52:54,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3761206.6666666665, ans=0.125 2023-11-29 01:53:12,847 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564200 2023-11-29 01:53:16,679 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11100, loss[loss=0.1089, simple_loss=0.1434, pruned_loss=0.0298, audio_tagging_loss=0.007388, over 15189.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08872, pruned_loss=0.01189, audio_tagging_loss=0.008759, over 3058218.98 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:53:35,033 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:53:56,286 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 9.017e+01 9.701e+01 1.046e+02 1.396e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 01:54:13,913 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564250 2023-11-29 01:54:17,350 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11150, loss[loss=0.06688, simple_loss=0.09984, pruned_loss=0.01057, audio_tagging_loss=0.006389, over 16469.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08899, pruned_loss=0.01195, audio_tagging_loss=0.008809, over 3058903.91 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:54:17,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3761673.3333333335, ans=0.1 2023-11-29 01:54:48,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3761806.6666666665, ans=0.125 2023-11-29 01:54:55,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3761873.3333333335, ans=0.0 2023-11-29 01:54:58,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3761873.3333333335, ans=0.0 2023-11-29 01:55:01,773 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.21 vs. limit=15.0 2023-11-29 01:55:04,920 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=22.5 2023-11-29 01:55:11,841 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.72 vs. limit=15.0 2023-11-29 01:55:13,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3761940.0, ans=0.0 2023-11-29 01:55:15,804 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564300 2023-11-29 01:55:19,846 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11200, loss[loss=0.06573, simple_loss=0.09109, pruned_loss=0.01096, audio_tagging_loss=0.009225, over 16042.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.0892, pruned_loss=0.01188, audio_tagging_loss=0.008881, over 3062899.64 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:55:33,174 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.34 vs. limit=12.0 2023-11-29 01:55:49,387 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:55:52,093 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.11 vs. limit=10.0 2023-11-29 01:55:56,107 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2023-11-29 01:55:58,720 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.132e+01 8.938e+01 9.585e+01 1.042e+02 1.236e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 01:55:58,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3762206.6666666665, ans=0.07 2023-11-29 01:55:59,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.39 vs. limit=22.5 2023-11-29 01:56:13,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3762273.3333333335, ans=0.1 2023-11-29 01:56:15,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3762273.3333333335, ans=0.1 2023-11-29 01:56:15,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3762273.3333333335, ans=0.2 2023-11-29 01:56:16,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3762273.3333333335, ans=0.0 2023-11-29 01:56:17,915 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564350 2023-11-29 01:56:21,345 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11250, loss[loss=0.05176, simple_loss=0.0709, pruned_loss=0.008231, audio_tagging_loss=0.008075, over 14824.00 frames. ], tot_loss[loss=0.0644, simple_loss=0.08775, pruned_loss=0.01162, audio_tagging_loss=0.008904, over 3064632.38 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:56:29,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3762340.0, ans=0.125 2023-11-29 01:56:31,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3762340.0, ans=0.0 2023-11-29 01:56:49,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3762473.3333333335, ans=0.1 2023-11-29 01:56:50,527 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.78 vs. limit=15.0 2023-11-29 01:57:08,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3762540.0, ans=0.0 2023-11-29 01:57:19,358 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564400 2023-11-29 01:57:23,249 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11300, loss[loss=0.06701, simple_loss=0.08769, pruned_loss=0.01426, audio_tagging_loss=0.008905, over 14320.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08841, pruned_loss=0.0117, audio_tagging_loss=0.008684, over 3058671.28 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:57:28,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3762673.3333333335, ans=0.0 2023-11-29 01:57:31,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3762673.3333333335, ans=0.09899494936611666 2023-11-29 01:57:54,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3762806.6666666665, ans=0.125 2023-11-29 01:58:04,524 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 9.082e+01 9.505e+01 1.023e+02 1.248e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-29 01:58:05,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3762873.3333333335, ans=0.2 2023-11-29 01:58:08,735 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.85 vs. limit=15.0 2023-11-29 01:58:21,602 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564450 2023-11-29 01:58:24,969 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11350, loss[loss=0.04378, simple_loss=0.06108, pruned_loss=0.004909, audio_tagging_loss=0.008326, over 14436.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08937, pruned_loss=0.01185, audio_tagging_loss=0.008536, over 3051129.78 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:58:27,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3763006.6666666665, ans=0.125 2023-11-29 01:58:33,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3763006.6666666665, ans=0.05 2023-11-29 01:58:47,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3763073.3333333335, ans=0.1 2023-11-29 01:58:53,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.50 vs. limit=15.0 2023-11-29 01:58:56,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3763140.0, ans=0.125 2023-11-29 01:59:19,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3763273.3333333335, ans=0.1 2023-11-29 01:59:20,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3763273.3333333335, ans=0.0 2023-11-29 01:59:22,758 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564500 2023-11-29 01:59:26,231 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11400, loss[loss=0.04438, simple_loss=0.05752, pruned_loss=0.007118, audio_tagging_loss=0.008499, over 15246.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08997, pruned_loss=0.01194, audio_tagging_loss=0.008386, over 3045536.30 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:59:36,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3763340.0, ans=0.1 2023-11-29 01:59:38,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3763406.6666666665, ans=0.125 2023-11-29 01:59:41,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3763406.6666666665, ans=0.1 2023-11-29 01:59:48,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3763406.6666666665, ans=0.1 2023-11-29 01:59:50,178 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.74 vs. limit=15.0 2023-11-29 01:59:53,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3763473.3333333335, ans=0.0 2023-11-29 01:59:59,338 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.38 vs. limit=10.0 2023-11-29 02:00:06,803 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.845e+01 9.645e+01 1.031e+02 1.502e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 02:00:12,154 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.09 vs. limit=15.0 2023-11-29 02:00:17,482 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.33 vs. limit=15.0 2023-11-29 02:00:19,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3763606.6666666665, ans=0.125 2023-11-29 02:00:23,952 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564550 2023-11-29 02:00:27,293 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11450, loss[loss=0.0775, simple_loss=0.1101, pruned_loss=0.01594, audio_tagging_loss=0.006523, over 14642.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.09049, pruned_loss=0.01195, audio_tagging_loss=0.008325, over 3052716.19 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:00:27,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3763673.3333333335, ans=0.2 2023-11-29 02:00:37,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3763673.3333333335, ans=0.015 2023-11-29 02:00:56,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3763806.6666666665, ans=0.125 2023-11-29 02:01:00,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3763806.6666666665, ans=0.1 2023-11-29 02:01:16,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3763940.0, ans=0.5 2023-11-29 02:01:24,803 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564600 2023-11-29 02:01:28,657 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11500, loss[loss=0.06735, simple_loss=0.09402, pruned_loss=0.0114, audio_tagging_loss=0.00894, over 15261.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.09027, pruned_loss=0.01193, audio_tagging_loss=0.008391, over 3054883.82 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:01:37,102 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:01:37,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-11-29 02:01:44,883 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.93 vs. limit=22.5 2023-11-29 02:01:45,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3764073.3333333335, ans=0.125 2023-11-29 02:02:02,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3764140.0, ans=0.2 2023-11-29 02:02:09,659 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.803e+01 8.942e+01 9.581e+01 1.022e+02 1.259e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-29 02:02:13,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3764206.6666666665, ans=0.125 2023-11-29 02:02:19,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3764273.3333333335, ans=0.0 2023-11-29 02:02:23,237 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.94 vs. limit=15.0 2023-11-29 02:02:27,516 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564650 2023-11-29 02:02:27,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3764273.3333333335, ans=0.125 2023-11-29 02:02:29,140 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.36 vs. limit=15.0 2023-11-29 02:02:30,877 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11550, loss[loss=0.0646, simple_loss=0.09504, pruned_loss=0.009934, audio_tagging_loss=0.007145, over 16823.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08981, pruned_loss=0.01175, audio_tagging_loss=0.008392, over 3055166.14 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:02:35,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3764340.0, ans=0.125 2023-11-29 02:02:43,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3764406.6666666665, ans=0.0 2023-11-29 02:02:50,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3764406.6666666665, ans=0.0 2023-11-29 02:02:55,404 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.91 vs. limit=12.0 2023-11-29 02:02:57,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3764473.3333333335, ans=0.0 2023-11-29 02:02:58,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3764473.3333333335, ans=0.125 2023-11-29 02:03:01,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3764473.3333333335, ans=0.125 2023-11-29 02:03:10,491 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 02:03:16,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3764540.0, ans=0.2 2023-11-29 02:03:26,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2023-11-29 02:03:28,079 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564700 2023-11-29 02:03:31,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.31 vs. limit=15.0 2023-11-29 02:03:32,104 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11600, loss[loss=0.07903, simple_loss=0.1082, pruned_loss=0.01477, audio_tagging_loss=0.01015, over 14957.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08945, pruned_loss=0.01175, audio_tagging_loss=0.008389, over 3056247.25 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 02:03:33,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3764673.3333333335, ans=0.125 2023-11-29 02:03:50,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3764740.0, ans=0.2 2023-11-29 02:04:11,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3764873.3333333335, ans=0.0 2023-11-29 02:04:13,363 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.962e+01 8.912e+01 9.637e+01 1.036e+02 1.418e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-29 02:04:17,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3764873.3333333335, ans=0.5 2023-11-29 02:04:25,934 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.51 vs. limit=10.0 2023-11-29 02:04:29,824 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564750 2023-11-29 02:04:31,471 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.19 vs. limit=22.5 2023-11-29 02:04:33,233 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11650, loss[loss=0.06216, simple_loss=0.08933, pruned_loss=0.01005, audio_tagging_loss=0.007448, over 15637.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08991, pruned_loss=0.01173, audio_tagging_loss=0.008336, over 3062174.73 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:04:39,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3765006.6666666665, ans=0.0 2023-11-29 02:05:06,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3765140.0, ans=0.2 2023-11-29 02:05:15,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3765206.6666666665, ans=0.1 2023-11-29 02:05:24,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3765273.3333333335, ans=0.0 2023-11-29 02:05:26,545 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.20 vs. limit=15.0 2023-11-29 02:05:31,042 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564800 2023-11-29 02:05:34,764 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11700, loss[loss=0.03384, simple_loss=0.04007, pruned_loss=0.003516, audio_tagging_loss=0.01028, over 15969.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08874, pruned_loss=0.01167, audio_tagging_loss=0.008446, over 3054609.21 frames. ], batch size: 65, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:05:34,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3765340.0, ans=0.035 2023-11-29 02:05:35,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3765340.0, ans=0.125 2023-11-29 02:05:38,955 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2023-11-29 02:05:40,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3765340.0, ans=0.2 2023-11-29 02:06:06,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3765473.3333333335, ans=0.09899494936611666 2023-11-29 02:06:06,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3765473.3333333335, ans=0.0 2023-11-29 02:06:16,824 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.400e+01 8.764e+01 9.502e+01 1.022e+02 1.260e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-29 02:06:25,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3765606.6666666665, ans=0.5 2023-11-29 02:06:32,047 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564850 2023-11-29 02:06:35,552 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11750, loss[loss=0.05444, simple_loss=0.0751, pruned_loss=0.00756, audio_tagging_loss=0.009333, over 15900.00 frames. ], tot_loss[loss=0.06413, simple_loss=0.08814, pruned_loss=0.01151, audio_tagging_loss=0.008542, over 3057336.02 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:06:43,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3765673.3333333335, ans=0.125 2023-11-29 02:06:49,178 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=15.0 2023-11-29 02:07:00,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3765806.6666666665, ans=0.125 2023-11-29 02:07:23,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3765940.0, ans=0.125 2023-11-29 02:07:24,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3765940.0, ans=0.125 2023-11-29 02:07:25,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3765940.0, ans=0.125 2023-11-29 02:07:33,497 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564900 2023-11-29 02:07:37,661 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11800, loss[loss=0.08636, simple_loss=0.1188, pruned_loss=0.01633, audio_tagging_loss=0.01061, over 15088.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08894, pruned_loss=0.01191, audio_tagging_loss=0.00853, over 3057768.73 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:07:40,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3766006.6666666665, ans=0.125 2023-11-29 02:07:53,051 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.88 vs. limit=22.5 2023-11-29 02:08:01,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3766140.0, ans=0.125 2023-11-29 02:08:02,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.89 vs. limit=6.0 2023-11-29 02:08:06,823 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=12.0 2023-11-29 02:08:17,934 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.655e+01 9.243e+01 9.839e+01 1.045e+02 1.336e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-29 02:08:26,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3766273.3333333335, ans=0.125 2023-11-29 02:08:31,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3766273.3333333335, ans=0.2 2023-11-29 02:08:33,029 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.89 vs. limit=15.0 2023-11-29 02:08:35,215 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564950 2023-11-29 02:08:38,648 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11850, loss[loss=0.06153, simple_loss=0.08432, pruned_loss=0.0113, audio_tagging_loss=0.008067, over 15752.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.089, pruned_loss=0.0119, audio_tagging_loss=0.008544, over 3051125.18 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:08:44,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3766340.0, ans=0.125 2023-11-29 02:08:58,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3766406.6666666665, ans=0.125 2023-11-29 02:09:01,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.73 vs. limit=15.0 2023-11-29 02:09:03,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3766473.3333333335, ans=0.125 2023-11-29 02:09:04,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3766473.3333333335, ans=0.0 2023-11-29 02:09:12,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=3766473.3333333335, ans=0.1 2023-11-29 02:09:35,021 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565000 2023-11-29 02:09:38,746 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11900, loss[loss=0.05763, simple_loss=0.07013, pruned_loss=0.01564, audio_tagging_loss=0.006933, over 15075.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08892, pruned_loss=0.01192, audio_tagging_loss=0.008624, over 3048632.58 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:09:45,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3766673.3333333335, ans=15.0 2023-11-29 02:09:52,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3766740.0, ans=0.2 2023-11-29 02:10:19,337 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.802e+01 8.853e+01 9.543e+01 1.009e+02 1.340e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-29 02:10:30,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3766940.0, ans=0.125 2023-11-29 02:10:34,529 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565050 2023-11-29 02:10:37,878 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11950, loss[loss=0.07683, simple_loss=0.1088, pruned_loss=0.01588, audio_tagging_loss=0.006548, over 14905.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.089, pruned_loss=0.01187, audio_tagging_loss=0.008728, over 3045706.26 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:10:38,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.21 vs. limit=15.0 2023-11-29 02:10:43,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3767006.6666666665, ans=0.2 2023-11-29 02:11:15,102 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=15.0 2023-11-29 02:11:15,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3767206.6666666665, ans=0.0 2023-11-29 02:11:28,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3767273.3333333335, ans=0.125 2023-11-29 02:11:29,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3767273.3333333335, ans=0.2 2023-11-29 02:11:32,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3767273.3333333335, ans=0.2 2023-11-29 02:11:32,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3767273.3333333335, ans=0.2 2023-11-29 02:11:33,324 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565100 2023-11-29 02:11:36,597 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 12000, loss[loss=0.06364, simple_loss=0.08178, pruned_loss=0.01232, audio_tagging_loss=0.01044, over 15745.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08927, pruned_loss=0.01197, audio_tagging_loss=0.008891, over 3044526.41 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 02:11:36,599 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-29 02:12:15,271 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.8829, 5.7522, 5.6255, 5.4648], device='cuda:0') 2023-11-29 02:12:16,770 INFO [train_asr.py:1267] (0/4) Epoch 47, validation: loss=0.05799, simple_loss=0.0505, pruned_loss=0.005391, audio_tagging_loss=0.02735, over 4681554.00 frames. 2023-11-29 02:12:16,771 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-29 02:12:21,764 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.08 vs. limit=15.0 2023-11-29 02:12:31,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3767406.6666666665, ans=0.0 2023-11-29 02:12:43,294 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-47.pt 2023-11-29 02:13:00,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3767493.3333333335, ans=0.125 2023-11-29 02:13:01,543 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 0, loss[loss=0.06775, simple_loss=0.0873, pruned_loss=0.005115, audio_tagging_loss=0.01899, over 15082.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.0873, pruned_loss=0.005115, audio_tagging_loss=0.01899, over 15082.00 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:13:01,547 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-29 02:13:22,586 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1298, 2.4263, 5.0210, 3.0307], device='cuda:0') 2023-11-29 02:13:28,327 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4299, 3.8415, 3.1873, 3.7901], device='cuda:0') 2023-11-29 02:13:33,938 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9557, 3.1364, 2.9536, 3.1768, 3.3704, 2.7402, 3.4051, 2.5185], device='cuda:0') 2023-11-29 02:13:35,470 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.0936, 2.7581, 1.7978, 2.6391, 3.2116, 3.2879, 3.1448, 3.5394], device='cuda:0') 2023-11-29 02:13:36,869 INFO [train_asr.py:1267] (0/4) Epoch 48, validation: loss=0.05814, simple_loss=0.05045, pruned_loss=0.005317, audio_tagging_loss=0.02759, over 4681554.00 frames. 2023-11-29 02:13:36,870 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-29 02:13:43,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3767493.3333333335, ans=0.0 2023-11-29 02:13:46,790 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.55 vs. limit=22.5 2023-11-29 02:13:50,199 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.77 vs. limit=6.0 2023-11-29 02:13:50,680 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.137e+01 9.361e+01 1.012e+02 1.115e+02 1.422e+02, threshold=2.023e+02, percent-clipped=0.0 2023-11-29 02:13:58,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3767560.0, ans=0.0 2023-11-29 02:14:08,490 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565150 2023-11-29 02:14:23,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3767693.3333333335, ans=0.125 2023-11-29 02:14:40,302 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 50, loss[loss=0.07378, simple_loss=0.09592, pruned_loss=0.01066, audio_tagging_loss=0.01516, over 15848.00 frames. ], tot_loss[loss=0.07359, simple_loss=0.0902, pruned_loss=0.01209, audio_tagging_loss=0.01641, over 683632.66 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:15:04,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3767960.0, ans=0.125 2023-11-29 02:15:09,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3767960.0, ans=0.2 2023-11-29 02:15:10,147 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565200 2023-11-29 02:15:43,427 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 100, loss[loss=0.07273, simple_loss=0.09845, pruned_loss=0.01059, audio_tagging_loss=0.01291, over 14998.00 frames. ], tot_loss[loss=0.07251, simple_loss=0.08963, pruned_loss=0.01185, audio_tagging_loss=0.01584, over 1208858.82 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:15:44,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3768160.0, ans=0.125 2023-11-29 02:15:56,411 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.236e+01 9.896e+01 1.062e+02 1.155e+02 1.316e+02, threshold=2.123e+02, percent-clipped=0.0 2023-11-29 02:16:11,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3768293.3333333335, ans=0.0 2023-11-29 02:16:11,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3768293.3333333335, ans=0.07 2023-11-29 02:16:12,266 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565250 2023-11-29 02:16:22,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3768360.0, ans=0.125 2023-11-29 02:16:33,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3768426.6666666665, ans=0.125 2023-11-29 02:16:36,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3768426.6666666665, ans=0.0 2023-11-29 02:16:36,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.05 vs. limit=10.0 2023-11-29 02:16:43,703 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 150, loss[loss=0.0682, simple_loss=0.0933, pruned_loss=0.01141, audio_tagging_loss=0.01015, over 16077.00 frames. ], tot_loss[loss=0.07049, simple_loss=0.0886, pruned_loss=0.01194, audio_tagging_loss=0.01425, over 1614062.24 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:16:49,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3768493.3333333335, ans=0.0 2023-11-29 02:16:54,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3768560.0, ans=0.2 2023-11-29 02:17:02,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3768560.0, ans=0.0 2023-11-29 02:17:14,163 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565300 2023-11-29 02:17:26,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3768693.3333333335, ans=0.0 2023-11-29 02:17:31,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3768693.3333333335, ans=0.0 2023-11-29 02:17:46,577 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 200, loss[loss=0.07483, simple_loss=0.09737, pruned_loss=0.01883, audio_tagging_loss=0.00731, over 15395.00 frames. ], tot_loss[loss=0.06833, simple_loss=0.08748, pruned_loss=0.01181, audio_tagging_loss=0.01278, over 1933361.88 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:17:59,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.12 vs. limit=15.0 2023-11-29 02:18:02,094 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.150e+01 9.879e+01 1.074e+02 1.273e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-29 02:18:02,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3768893.3333333335, ans=0.0 2023-11-29 02:18:05,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3768893.3333333335, ans=0.125 2023-11-29 02:18:16,767 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565350 2023-11-29 02:18:23,138 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2023-11-29 02:18:39,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3769093.3333333335, ans=0.0 2023-11-29 02:18:39,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3769093.3333333335, ans=0.125 2023-11-29 02:18:49,030 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 250, loss[loss=0.06358, simple_loss=0.08602, pruned_loss=0.01058, audio_tagging_loss=0.009991, over 15515.00 frames. ], tot_loss[loss=0.06816, simple_loss=0.08887, pruned_loss=0.01213, audio_tagging_loss=0.0116, over 2173542.11 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:18:52,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3769160.0, ans=0.125 2023-11-29 02:18:52,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3769160.0, ans=0.125 2023-11-29 02:19:12,441 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2023-11-29 02:19:13,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3769293.3333333335, ans=0.125 2023-11-29 02:19:18,420 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565400 2023-11-29 02:19:24,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3769360.0, ans=0.07 2023-11-29 02:19:27,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.14 vs. limit=15.0 2023-11-29 02:19:32,052 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=12.0 2023-11-29 02:19:34,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3769360.0, ans=0.0 2023-11-29 02:19:36,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3769360.0, ans=0.125 2023-11-29 02:19:46,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3769426.6666666665, ans=0.125 2023-11-29 02:19:51,091 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 300, loss[loss=0.06243, simple_loss=0.09392, pruned_loss=0.008389, audio_tagging_loss=0.007077, over 14961.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.08918, pruned_loss=0.01209, audio_tagging_loss=0.0107, over 2360645.66 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:19:56,485 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.15 vs. limit=10.0 2023-11-29 02:20:05,682 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.057e+01 9.202e+01 9.824e+01 1.066e+02 1.297e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-29 02:20:19,758 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565450 2023-11-29 02:20:52,716 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 350, loss[loss=0.07094, simple_loss=0.1039, pruned_loss=0.0101, audio_tagging_loss=0.0089, over 14792.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.08991, pruned_loss=0.0122, audio_tagging_loss=0.0101, over 2519170.44 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:20:52,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3769826.6666666665, ans=0.125 2023-11-29 02:21:22,241 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565500 2023-11-29 02:21:37,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.89 vs. limit=15.0 2023-11-29 02:21:44,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3770093.3333333335, ans=0.125 2023-11-29 02:21:45,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3770093.3333333335, ans=0.0 2023-11-29 02:21:53,373 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 400, loss[loss=0.04679, simple_loss=0.06138, pruned_loss=0.007493, audio_tagging_loss=0.008611, over 15080.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.08916, pruned_loss=0.012, audio_tagging_loss=0.009788, over 2636625.81 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:22:09,130 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.365e+01 8.956e+01 9.429e+01 1.009e+02 1.369e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-29 02:22:10,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3770226.6666666665, ans=0.04949747468305833 2023-11-29 02:22:10,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3770226.6666666665, ans=0.125 2023-11-29 02:22:10,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3770226.6666666665, ans=0.125 2023-11-29 02:22:23,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3770293.3333333335, ans=0.1 2023-11-29 02:22:23,897 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565550 2023-11-29 02:22:26,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3770293.3333333335, ans=0.125 2023-11-29 02:22:30,245 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.15 vs. limit=6.0 2023-11-29 02:22:56,253 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 450, loss[loss=0.05011, simple_loss=0.06525, pruned_loss=0.009045, audio_tagging_loss=0.008441, over 15804.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08965, pruned_loss=0.01208, audio_tagging_loss=0.00941, over 2728846.55 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:23:03,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3770493.3333333335, ans=0.125 2023-11-29 02:23:24,853 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565600 2023-11-29 02:23:25,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3770626.6666666665, ans=0.125 2023-11-29 02:23:34,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.31 vs. limit=15.0 2023-11-29 02:23:47,580 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.89 vs. limit=15.0 2023-11-29 02:23:57,789 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 500, loss[loss=0.07336, simple_loss=0.1022, pruned_loss=0.01309, audio_tagging_loss=0.009173, over 14405.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08914, pruned_loss=0.01198, audio_tagging_loss=0.00927, over 2799733.32 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:24:04,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3770826.6666666665, ans=0.0 2023-11-29 02:24:05,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3770826.6666666665, ans=0.1 2023-11-29 02:24:12,841 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.822e+01 8.944e+01 9.480e+01 1.026e+02 1.531e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-29 02:24:25,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3770960.0, ans=0.125 2023-11-29 02:24:26,917 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565650 2023-11-29 02:24:29,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3770960.0, ans=0.125 2023-11-29 02:24:31,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3770960.0, ans=0.125 2023-11-29 02:24:33,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3771026.6666666665, ans=0.125 2023-11-29 02:24:58,605 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 550, loss[loss=0.06004, simple_loss=0.07851, pruned_loss=0.01015, audio_tagging_loss=0.01064, over 14309.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08868, pruned_loss=0.01183, audio_tagging_loss=0.009051, over 2861160.64 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:25:12,023 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.98 vs. limit=10.0 2023-11-29 02:25:28,909 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565700 2023-11-29 02:25:41,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3771360.0, ans=0.125 2023-11-29 02:25:58,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3771426.6666666665, ans=0.125 2023-11-29 02:26:00,441 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 600, loss[loss=0.06326, simple_loss=0.08452, pruned_loss=0.01223, audio_tagging_loss=0.008766, over 15277.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08824, pruned_loss=0.01174, audio_tagging_loss=0.008984, over 2913123.40 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:26:16,978 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 8.960e+01 9.657e+01 1.065e+02 1.783e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-29 02:26:26,865 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:26:30,143 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565750 2023-11-29 02:26:33,647 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:27:02,381 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 650, loss[loss=0.06211, simple_loss=0.08642, pruned_loss=0.009258, audio_tagging_loss=0.009637, over 14951.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08817, pruned_loss=0.01188, audio_tagging_loss=0.009056, over 2940545.94 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:27:17,035 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2023-11-29 02:27:20,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3771893.3333333335, ans=0.125 2023-11-29 02:27:31,265 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565800 2023-11-29 02:27:38,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3772026.6666666665, ans=0.2 2023-11-29 02:27:40,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3772026.6666666665, ans=0.0 2023-11-29 02:27:49,828 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2023-11-29 02:28:03,593 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 700, loss[loss=0.06685, simple_loss=0.09299, pruned_loss=0.01133, audio_tagging_loss=0.009018, over 15258.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08839, pruned_loss=0.01187, audio_tagging_loss=0.008967, over 2965957.75 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:28:17,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3772226.6666666665, ans=0.125 2023-11-29 02:28:19,247 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.910e+01 9.498e+01 1.006e+02 1.347e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-29 02:28:20,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3772226.6666666665, ans=0.125 2023-11-29 02:28:32,952 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565850 2023-11-29 02:28:34,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3772293.3333333335, ans=0.04949747468305833 2023-11-29 02:28:44,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3772360.0, ans=0.125 2023-11-29 02:28:44,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3772360.0, ans=0.1 2023-11-29 02:28:47,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3772360.0, ans=0.125 2023-11-29 02:28:59,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3772426.6666666665, ans=0.125 2023-11-29 02:29:01,217 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:29:04,561 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 750, loss[loss=0.0661, simple_loss=0.09722, pruned_loss=0.005589, audio_tagging_loss=0.0119, over 15475.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08961, pruned_loss=0.01213, audio_tagging_loss=0.008935, over 2992217.68 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:29:29,422 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2023-11-29 02:29:33,938 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565900 2023-11-29 02:29:37,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3772626.6666666665, ans=0.0 2023-11-29 02:29:51,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3772693.3333333335, ans=0.125 2023-11-29 02:29:52,631 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.25 vs. limit=12.0 2023-11-29 02:29:53,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3772760.0, ans=0.125 2023-11-29 02:29:54,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3772760.0, ans=0.125 2023-11-29 02:30:03,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3772826.6666666665, ans=0.125 2023-11-29 02:30:05,606 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 800, loss[loss=0.08014, simple_loss=0.1071, pruned_loss=0.01667, audio_tagging_loss=0.009948, over 15365.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09061, pruned_loss=0.01242, audio_tagging_loss=0.008946, over 3004120.08 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:30:21,583 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 9.034e+01 9.772e+01 1.029e+02 1.331e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 02:30:30,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3772960.0, ans=0.04949747468305833 2023-11-29 02:30:35,171 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565950 2023-11-29 02:30:44,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3773026.6666666665, ans=0.1 2023-11-29 02:31:06,986 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 850, loss[loss=0.05368, simple_loss=0.07358, pruned_loss=0.009445, audio_tagging_loss=0.007444, over 15517.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09022, pruned_loss=0.01229, audio_tagging_loss=0.008958, over 3011736.23 frames. ], batch size: 63, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:31:14,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3773160.0, ans=0.0 2023-11-29 02:31:36,535 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566000 2023-11-29 02:31:50,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3773360.0, ans=0.025 2023-11-29 02:31:52,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3773360.0, ans=0.125 2023-11-29 02:31:58,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3773426.6666666665, ans=0.0 2023-11-29 02:32:06,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3773426.6666666665, ans=0.1 2023-11-29 02:32:06,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3773426.6666666665, ans=0.05 2023-11-29 02:32:08,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3773493.3333333335, ans=0.1 2023-11-29 02:32:09,893 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 900, loss[loss=0.06438, simple_loss=0.09019, pruned_loss=0.01277, audio_tagging_loss=0.006515, over 15598.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08924, pruned_loss=0.01201, audio_tagging_loss=0.009074, over 3023119.49 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:32:10,531 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.10 vs. limit=6.0 2023-11-29 02:32:17,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3773493.3333333335, ans=0.2 2023-11-29 02:32:19,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3773493.3333333335, ans=0.1 2023-11-29 02:32:26,360 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 9.124e+01 9.810e+01 1.032e+02 1.259e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 02:32:35,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3773626.6666666665, ans=0.05 2023-11-29 02:32:39,667 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566050 2023-11-29 02:32:41,545 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.41 vs. limit=22.5 2023-11-29 02:32:49,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3773693.3333333335, ans=0.2 2023-11-29 02:33:04,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3773760.0, ans=0.1 2023-11-29 02:33:11,553 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 950, loss[loss=0.06635, simple_loss=0.09165, pruned_loss=0.01413, audio_tagging_loss=0.006402, over 15409.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08949, pruned_loss=0.01197, audio_tagging_loss=0.008971, over 3032515.72 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:33:26,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3773893.3333333335, ans=0.125 2023-11-29 02:33:27,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3773893.3333333335, ans=0.125 2023-11-29 02:33:28,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3773893.3333333335, ans=0.0 2023-11-29 02:33:41,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3773960.0, ans=0.125 2023-11-29 02:33:42,064 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566100 2023-11-29 02:33:43,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3773960.0, ans=0.1 2023-11-29 02:34:07,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3774093.3333333335, ans=0.2 2023-11-29 02:34:13,551 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1000, loss[loss=0.0555, simple_loss=0.0703, pruned_loss=0.009525, audio_tagging_loss=0.01083, over 15266.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.0906, pruned_loss=0.01219, audio_tagging_loss=0.00874, over 3039052.06 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:34:19,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3774160.0, ans=0.2 2023-11-29 02:34:30,803 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 9.000e+01 9.678e+01 1.023e+02 1.395e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-29 02:34:31,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3774226.6666666665, ans=0.125 2023-11-29 02:34:41,619 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 02:34:42,811 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566150 2023-11-29 02:35:03,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3774426.6666666665, ans=0.2 2023-11-29 02:35:14,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3774493.3333333335, ans=0.0 2023-11-29 02:35:15,277 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1050, loss[loss=0.07538, simple_loss=0.1017, pruned_loss=0.01722, audio_tagging_loss=0.007309, over 14373.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.0896, pruned_loss=0.01195, audio_tagging_loss=0.00868, over 3036181.40 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:35:20,540 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.46 vs. limit=22.5 2023-11-29 02:35:27,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3774560.0, ans=0.125 2023-11-29 02:35:30,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3774560.0, ans=0.1 2023-11-29 02:35:39,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3774626.6666666665, ans=0.0 2023-11-29 02:35:44,907 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566200 2023-11-29 02:35:59,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3774693.3333333335, ans=0.125 2023-11-29 02:36:11,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3774760.0, ans=0.125 2023-11-29 02:36:11,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3774760.0, ans=0.125 2023-11-29 02:36:16,942 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1100, loss[loss=0.04973, simple_loss=0.07071, pruned_loss=0.006191, audio_tagging_loss=0.008182, over 16253.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08869, pruned_loss=0.01187, audio_tagging_loss=0.008586, over 3036990.83 frames. ], batch size: 62, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:36:18,780 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.42 vs. limit=15.0 2023-11-29 02:36:20,976 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=22.5 2023-11-29 02:36:21,703 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 02:36:28,783 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2023-11-29 02:36:34,686 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.921e+01 9.429e+01 9.964e+01 1.346e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-29 02:36:47,028 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566250 2023-11-29 02:36:50,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3774960.0, ans=0.1 2023-11-29 02:36:51,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3774960.0, ans=0.0 2023-11-29 02:36:53,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3775026.6666666665, ans=0.0 2023-11-29 02:36:59,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3775026.6666666665, ans=0.125 2023-11-29 02:37:15,369 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.22 vs. limit=12.0 2023-11-29 02:37:19,251 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1150, loss[loss=0.07477, simple_loss=0.1069, pruned_loss=0.01416, audio_tagging_loss=0.007159, over 14892.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.09001, pruned_loss=0.01209, audio_tagging_loss=0.008447, over 3040242.40 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:37:23,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3775160.0, ans=0.125 2023-11-29 02:37:27,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3775160.0, ans=0.125 2023-11-29 02:37:49,339 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566300 2023-11-29 02:38:21,968 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1200, loss[loss=0.06062, simple_loss=0.0853, pruned_loss=0.008929, audio_tagging_loss=0.009034, over 15591.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08867, pruned_loss=0.01198, audio_tagging_loss=0.00851, over 3037154.07 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:38:39,030 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.292e+01 9.106e+01 9.655e+01 1.032e+02 1.347e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-29 02:38:42,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3775560.0, ans=0.0 2023-11-29 02:38:51,474 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566350 2023-11-29 02:39:02,700 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:39:09,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3775693.3333333335, ans=0.0 2023-11-29 02:39:14,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3775760.0, ans=0.125 2023-11-29 02:39:23,526 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1250, loss[loss=0.06366, simple_loss=0.08358, pruned_loss=0.01328, audio_tagging_loss=0.008599, over 14823.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.0893, pruned_loss=0.01207, audio_tagging_loss=0.008485, over 3037798.87 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:39:27,848 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.65 vs. limit=15.0 2023-11-29 02:39:47,782 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2023-11-29 02:39:53,014 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566400 2023-11-29 02:39:53,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3775960.0, ans=0.125 2023-11-29 02:39:56,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3775960.0, ans=0.125 2023-11-29 02:39:58,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3775960.0, ans=0.125 2023-11-29 02:40:14,854 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.13 vs. limit=22.5 2023-11-29 02:40:20,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3776093.3333333335, ans=0.0 2023-11-29 02:40:25,340 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1300, loss[loss=0.07099, simple_loss=0.1024, pruned_loss=0.01123, audio_tagging_loss=0.008583, over 16118.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08951, pruned_loss=0.01195, audio_tagging_loss=0.00845, over 3035580.06 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:40:26,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3776160.0, ans=0.2 2023-11-29 02:40:44,004 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 8.895e+01 9.443e+01 1.023e+02 1.246e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-29 02:40:44,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3776226.6666666665, ans=0.125 2023-11-29 02:40:48,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3776226.6666666665, ans=0.0 2023-11-29 02:40:49,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3776293.3333333335, ans=0.1 2023-11-29 02:40:55,088 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566450 2023-11-29 02:41:09,762 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:41:26,043 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1350, loss[loss=0.0796, simple_loss=0.1152, pruned_loss=0.01534, audio_tagging_loss=0.00666, over 15514.00 frames. ], tot_loss[loss=0.06433, simple_loss=0.08826, pruned_loss=0.01169, audio_tagging_loss=0.008509, over 3043241.50 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:41:26,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3776493.3333333335, ans=0.125 2023-11-29 02:41:57,008 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566500 2023-11-29 02:42:07,960 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.95 vs. limit=22.5 2023-11-29 02:42:10,277 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.34 vs. limit=15.0 2023-11-29 02:42:13,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3776693.3333333335, ans=0.125 2023-11-29 02:42:13,952 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 02:42:17,822 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.95 vs. limit=15.0 2023-11-29 02:42:25,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3776760.0, ans=0.04949747468305833 2023-11-29 02:42:29,846 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1400, loss[loss=0.06254, simple_loss=0.08453, pruned_loss=0.01012, audio_tagging_loss=0.01015, over 16731.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08952, pruned_loss=0.01197, audio_tagging_loss=0.00854, over 3046463.41 frames. ], batch size: 62, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:42:32,704 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.75 vs. limit=15.0 2023-11-29 02:42:43,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3776893.3333333335, ans=0.125 2023-11-29 02:42:45,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3776893.3333333335, ans=0.125 2023-11-29 02:42:47,819 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 8.922e+01 9.372e+01 1.016e+02 1.403e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-29 02:42:58,401 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566550 2023-11-29 02:43:02,980 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.85 vs. limit=15.0 2023-11-29 02:43:05,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3777026.6666666665, ans=0.125 2023-11-29 02:43:05,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3777026.6666666665, ans=0.125 2023-11-29 02:43:28,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3777093.3333333335, ans=0.2 2023-11-29 02:43:30,457 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1450, loss[loss=0.07017, simple_loss=0.1044, pruned_loss=0.009865, audio_tagging_loss=0.008095, over 15612.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08988, pruned_loss=0.01194, audio_tagging_loss=0.008613, over 3040652.30 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:43:34,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3777160.0, ans=0.0 2023-11-29 02:43:45,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3777226.6666666665, ans=0.0 2023-11-29 02:43:49,899 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2023-11-29 02:43:54,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3777293.3333333335, ans=0.125 2023-11-29 02:43:57,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3777293.3333333335, ans=0.125 2023-11-29 02:44:00,673 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566600 2023-11-29 02:44:15,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3777360.0, ans=0.125 2023-11-29 02:44:30,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3777426.6666666665, ans=0.125 2023-11-29 02:44:32,086 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1500, loss[loss=0.08261, simple_loss=0.1135, pruned_loss=0.01797, audio_tagging_loss=0.007906, over 14867.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08915, pruned_loss=0.01187, audio_tagging_loss=0.008706, over 3038912.62 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:44:51,117 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.796e+01 9.184e+01 9.950e+01 1.078e+02 1.281e+02, threshold=1.990e+02, percent-clipped=0.0 2023-11-29 02:44:57,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3777626.6666666665, ans=0.125 2023-11-29 02:45:02,428 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566650 2023-11-29 02:45:05,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3777626.6666666665, ans=0.0 2023-11-29 02:45:23,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3777760.0, ans=0.1 2023-11-29 02:45:33,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3777826.6666666665, ans=0.0 2023-11-29 02:45:34,468 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1550, loss[loss=0.0541, simple_loss=0.07624, pruned_loss=0.007069, audio_tagging_loss=0.008913, over 14930.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08856, pruned_loss=0.0118, audio_tagging_loss=0.008715, over 3042084.12 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:45:38,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3777826.6666666665, ans=0.0 2023-11-29 02:45:39,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3777826.6666666665, ans=0.125 2023-11-29 02:46:03,629 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566700 2023-11-29 02:46:07,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3777960.0, ans=0.0 2023-11-29 02:46:17,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3778026.6666666665, ans=0.0 2023-11-29 02:46:18,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3778026.6666666665, ans=10.0 2023-11-29 02:46:31,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3778093.3333333335, ans=0.125 2023-11-29 02:46:36,497 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1600, loss[loss=0.05747, simple_loss=0.08162, pruned_loss=0.00696, audio_tagging_loss=0.009695, over 16245.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08862, pruned_loss=0.01166, audio_tagging_loss=0.008891, over 3047344.47 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:46:36,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=3778160.0, ans=0.1 2023-11-29 02:46:49,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3778226.6666666665, ans=0.0 2023-11-29 02:46:50,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3778226.6666666665, ans=0.125 2023-11-29 02:46:54,288 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.477e+01 9.129e+01 9.735e+01 1.042e+02 2.046e+02, threshold=1.947e+02, percent-clipped=1.0 2023-11-29 02:46:57,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3778226.6666666665, ans=0.0 2023-11-29 02:47:01,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3778293.3333333335, ans=0.0 2023-11-29 02:47:06,816 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566750 2023-11-29 02:47:29,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3778426.6666666665, ans=0.2 2023-11-29 02:47:36,028 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.36 vs. limit=15.0 2023-11-29 02:47:37,580 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1650, loss[loss=0.0569, simple_loss=0.07395, pruned_loss=0.009557, audio_tagging_loss=0.01036, over 15840.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08839, pruned_loss=0.01168, audio_tagging_loss=0.008907, over 3056408.45 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:47:53,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3778560.0, ans=0.0 2023-11-29 02:48:02,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.99 vs. limit=15.0 2023-11-29 02:48:07,741 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566800 2023-11-29 02:48:40,036 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1700, loss[loss=0.06612, simple_loss=0.09458, pruned_loss=0.01035, audio_tagging_loss=0.008473, over 15497.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.088, pruned_loss=0.01183, audio_tagging_loss=0.008981, over 3056550.27 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:48:46,016 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.70 vs. limit=6.0 2023-11-29 02:48:58,817 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.604e+01 9.116e+01 9.697e+01 1.043e+02 1.617e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-29 02:49:01,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3778893.3333333335, ans=0.2 2023-11-29 02:49:05,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3778960.0, ans=0.125 2023-11-29 02:49:06,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3778960.0, ans=0.0 2023-11-29 02:49:09,449 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566850 2023-11-29 02:49:09,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3778960.0, ans=0.125 2023-11-29 02:49:15,781 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.56 vs. limit=15.0 2023-11-29 02:49:19,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3779026.6666666665, ans=0.07 2023-11-29 02:49:24,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3779026.6666666665, ans=0.0 2023-11-29 02:49:25,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3779026.6666666665, ans=0.125 2023-11-29 02:49:41,236 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1750, loss[loss=0.094, simple_loss=0.1326, pruned_loss=0.02086, audio_tagging_loss=0.006819, over 15154.00 frames. ], tot_loss[loss=0.06434, simple_loss=0.08761, pruned_loss=0.01167, audio_tagging_loss=0.00886, over 3051819.73 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:49:47,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3779160.0, ans=0.125 2023-11-29 02:49:57,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3779226.6666666665, ans=0.0 2023-11-29 02:49:57,988 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.14 vs. limit=12.0 2023-11-29 02:50:01,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3779226.6666666665, ans=0.125 2023-11-29 02:50:11,355 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566900 2023-11-29 02:50:12,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3779293.3333333335, ans=0.1 2023-11-29 02:50:17,160 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:50:43,331 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1800, loss[loss=0.05751, simple_loss=0.08495, pruned_loss=0.008279, audio_tagging_loss=0.006759, over 14529.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.0887, pruned_loss=0.01176, audio_tagging_loss=0.008757, over 3049352.11 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:50:43,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3779493.3333333335, ans=0.125 2023-11-29 02:50:44,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3779493.3333333335, ans=0.125 2023-11-29 02:50:55,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3779560.0, ans=0.0 2023-11-29 02:50:56,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3779560.0, ans=0.0 2023-11-29 02:51:02,047 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.164e+01 9.130e+01 9.771e+01 1.053e+02 1.389e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 02:51:13,250 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566950 2023-11-29 02:51:24,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3779693.3333333335, ans=0.125 2023-11-29 02:51:27,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3779693.3333333335, ans=0.125 2023-11-29 02:51:45,201 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1850, loss[loss=0.07281, simple_loss=0.09892, pruned_loss=0.01368, audio_tagging_loss=0.009674, over 14927.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08902, pruned_loss=0.01193, audio_tagging_loss=0.008647, over 3053331.52 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:51:51,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3779826.6666666665, ans=0.125 2023-11-29 02:52:10,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3779960.0, ans=0.0 2023-11-29 02:52:15,127 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567000 2023-11-29 02:52:35,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3780093.3333333335, ans=0.125 2023-11-29 02:52:47,561 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1900, loss[loss=0.05322, simple_loss=0.07266, pruned_loss=0.009973, audio_tagging_loss=0.006912, over 14726.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08914, pruned_loss=0.01202, audio_tagging_loss=0.00859, over 3050903.55 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:53:06,764 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 9.033e+01 9.601e+01 1.005e+02 1.271e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 02:53:12,827 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:53:16,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.85 vs. limit=22.5 2023-11-29 02:53:17,327 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567050 2023-11-29 02:53:43,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3780426.6666666665, ans=0.2 2023-11-29 02:53:48,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3780493.3333333335, ans=0.0 2023-11-29 02:53:49,021 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1950, loss[loss=0.03342, simple_loss=0.04614, pruned_loss=0.004609, audio_tagging_loss=0.005741, over 15411.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08919, pruned_loss=0.01193, audio_tagging_loss=0.008511, over 3052428.38 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:53:57,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3780493.3333333335, ans=0.0 2023-11-29 02:54:14,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3780626.6666666665, ans=0.2 2023-11-29 02:54:18,119 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567100 2023-11-29 02:54:51,065 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2000, loss[loss=0.07519, simple_loss=0.1113, pruned_loss=0.01281, audio_tagging_loss=0.006757, over 16460.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.0889, pruned_loss=0.012, audio_tagging_loss=0.008501, over 3044748.60 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:55:07,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3780893.3333333335, ans=0.0 2023-11-29 02:55:08,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3780893.3333333335, ans=0.125 2023-11-29 02:55:08,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3780893.3333333335, ans=0.0 2023-11-29 02:55:10,618 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.857e+01 9.495e+01 1.044e+02 1.385e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-29 02:55:20,854 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567150 2023-11-29 02:55:22,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3780960.0, ans=0.0 2023-11-29 02:55:24,782 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=22.5 2023-11-29 02:55:31,849 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.17 vs. limit=15.0 2023-11-29 02:55:37,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3781026.6666666665, ans=0.0 2023-11-29 02:55:45,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3781093.3333333335, ans=0.125 2023-11-29 02:55:45,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3781093.3333333335, ans=0.0 2023-11-29 02:55:52,043 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2050, loss[loss=0.0685, simple_loss=0.09384, pruned_loss=0.01364, audio_tagging_loss=0.007944, over 15146.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08926, pruned_loss=0.01204, audio_tagging_loss=0.008484, over 3042334.64 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:55:53,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3781160.0, ans=0.125 2023-11-29 02:56:08,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3781226.6666666665, ans=0.1 2023-11-29 02:56:14,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3781226.6666666665, ans=0.125 2023-11-29 02:56:21,490 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567200 2023-11-29 02:56:26,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3781293.3333333335, ans=0.125 2023-11-29 02:56:41,343 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2023-11-29 02:56:53,266 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=15.0 2023-11-29 02:56:53,800 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2100, loss[loss=0.07878, simple_loss=0.1067, pruned_loss=0.01701, audio_tagging_loss=0.008402, over 15523.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08947, pruned_loss=0.01214, audio_tagging_loss=0.008428, over 3037958.36 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:56:56,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3781493.3333333335, ans=0.5 2023-11-29 02:56:58,100 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2023-11-29 02:56:59,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3781493.3333333335, ans=0.125 2023-11-29 02:57:01,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3781493.3333333335, ans=0.125 2023-11-29 02:57:10,529 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.93 vs. limit=15.0 2023-11-29 02:57:13,808 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.990e+01 9.043e+01 9.646e+01 1.030e+02 1.494e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 02:57:21,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3781626.6666666665, ans=0.0 2023-11-29 02:57:22,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3781626.6666666665, ans=0.125 2023-11-29 02:57:23,129 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567250 2023-11-29 02:57:49,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3781760.0, ans=0.0 2023-11-29 02:57:49,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3781760.0, ans=0.125 2023-11-29 02:57:53,296 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.70 vs. limit=15.0 2023-11-29 02:57:55,477 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2150, loss[loss=0.09477, simple_loss=0.1341, pruned_loss=0.01999, audio_tagging_loss=0.007708, over 15659.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08944, pruned_loss=0.01218, audio_tagging_loss=0.008447, over 3037198.83 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:58:01,135 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:58:04,854 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=15.0 2023-11-29 02:58:25,480 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567300 2023-11-29 02:58:33,052 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.21 vs. limit=22.5 2023-11-29 02:58:34,681 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 02:58:48,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3782093.3333333335, ans=0.0 2023-11-29 02:58:51,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=3782093.3333333335, ans=12.0 2023-11-29 02:58:56,771 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2200, loss[loss=0.06575, simple_loss=0.09408, pruned_loss=0.01103, audio_tagging_loss=0.007687, over 15511.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08974, pruned_loss=0.0123, audio_tagging_loss=0.008425, over 3041730.43 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:58:57,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3782160.0, ans=0.1 2023-11-29 02:59:01,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3782160.0, ans=0.125 2023-11-29 02:59:02,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3782160.0, ans=0.125 2023-11-29 02:59:16,876 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 9.088e+01 9.723e+01 1.043e+02 1.467e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-29 02:59:26,433 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567350 2023-11-29 02:59:32,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3782293.3333333335, ans=0.2 2023-11-29 02:59:42,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.51 vs. limit=22.5 2023-11-29 02:59:43,772 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2023-11-29 02:59:58,767 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2250, loss[loss=0.06608, simple_loss=0.08988, pruned_loss=0.01174, audio_tagging_loss=0.009396, over 14514.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.09013, pruned_loss=0.01237, audio_tagging_loss=0.008412, over 3045442.64 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:00:04,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3782493.3333333335, ans=0.2 2023-11-29 03:00:24,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3782626.6666666665, ans=0.1 2023-11-29 03:00:24,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3782626.6666666665, ans=0.0 2023-11-29 03:00:29,171 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567400 2023-11-29 03:00:31,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3782626.6666666665, ans=0.0 2023-11-29 03:00:42,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3782693.3333333335, ans=0.5 2023-11-29 03:00:52,567 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.29 vs. limit=10.0 2023-11-29 03:00:55,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3782760.0, ans=0.1 2023-11-29 03:01:01,006 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2300, loss[loss=0.06061, simple_loss=0.08439, pruned_loss=0.008358, audio_tagging_loss=0.01006, over 14818.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09064, pruned_loss=0.01248, audio_tagging_loss=0.008337, over 3042331.32 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:01:01,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3782826.6666666665, ans=0.0 2023-11-29 03:01:20,737 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 9.062e+01 9.669e+01 1.048e+02 1.317e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 03:01:30,807 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567450 2023-11-29 03:01:32,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3782960.0, ans=0.125 2023-11-29 03:01:32,781 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.14 vs. limit=15.0 2023-11-29 03:01:34,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3782960.0, ans=0.2 2023-11-29 03:01:44,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3783026.6666666665, ans=0.2 2023-11-29 03:01:44,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3783026.6666666665, ans=0.1 2023-11-29 03:01:58,234 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:02:00,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3783093.3333333335, ans=0.1 2023-11-29 03:02:02,752 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2350, loss[loss=0.08271, simple_loss=0.1186, pruned_loss=0.01599, audio_tagging_loss=0.007434, over 15412.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08995, pruned_loss=0.01236, audio_tagging_loss=0.008432, over 3048738.28 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:02:23,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3783226.6666666665, ans=0.0 2023-11-29 03:02:29,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3783293.3333333335, ans=0.125 2023-11-29 03:02:32,375 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567500 2023-11-29 03:03:04,484 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2400, loss[loss=0.05481, simple_loss=0.069, pruned_loss=0.008284, audio_tagging_loss=0.01202, over 15916.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08915, pruned_loss=0.01209, audio_tagging_loss=0.008534, over 3048453.06 frames. ], batch size: 62, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:03:27,144 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.409e+01 9.160e+01 9.857e+01 1.036e+02 1.512e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-29 03:03:34,358 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567550 2023-11-29 03:03:57,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.64 vs. limit=15.0 2023-11-29 03:03:59,789 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2023-11-29 03:04:05,684 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2450, loss[loss=0.07922, simple_loss=0.1014, pruned_loss=0.015, audio_tagging_loss=0.01351, over 16897.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08925, pruned_loss=0.01199, audio_tagging_loss=0.008645, over 3057270.05 frames. ], batch size: 63, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:04:35,699 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567600 2023-11-29 03:05:08,350 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2500, loss[loss=0.06247, simple_loss=0.0789, pruned_loss=0.0114, audio_tagging_loss=0.01162, over 15577.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08924, pruned_loss=0.01193, audio_tagging_loss=0.008723, over 3057939.58 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:05:28,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3784226.6666666665, ans=0.125 2023-11-29 03:05:30,131 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.483e+01 8.821e+01 9.659e+01 1.073e+02 1.403e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-29 03:05:37,186 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567650 2023-11-29 03:06:09,251 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2550, loss[loss=0.0772, simple_loss=0.1033, pruned_loss=0.01621, audio_tagging_loss=0.009349, over 16197.00 frames. ], tot_loss[loss=0.065, simple_loss=0.0886, pruned_loss=0.01194, audio_tagging_loss=0.008755, over 3050881.65 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:06:15,599 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:06:29,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3784560.0, ans=0.125 2023-11-29 03:06:40,478 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567700 2023-11-29 03:06:52,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3784693.3333333335, ans=0.1 2023-11-29 03:07:06,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3784760.0, ans=0.1 2023-11-29 03:07:12,041 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2600, loss[loss=0.06621, simple_loss=0.08958, pruned_loss=0.0151, audio_tagging_loss=0.006322, over 13930.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08883, pruned_loss=0.01194, audio_tagging_loss=0.008546, over 3048338.25 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:07:34,997 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.646e+01 8.651e+01 9.416e+01 1.044e+02 1.400e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-29 03:07:35,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3784893.3333333335, ans=0.1 2023-11-29 03:07:42,270 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567750 2023-11-29 03:07:53,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3785026.6666666665, ans=0.0 2023-11-29 03:08:12,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.35 vs. limit=15.0 2023-11-29 03:08:14,966 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2650, loss[loss=0.08835, simple_loss=0.1169, pruned_loss=0.01842, audio_tagging_loss=0.01147, over 15997.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08891, pruned_loss=0.01187, audio_tagging_loss=0.008564, over 3046213.29 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:08:17,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3785160.0, ans=0.125 2023-11-29 03:08:23,677 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2023-11-29 03:08:25,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3785226.6666666665, ans=0.2 2023-11-29 03:08:38,204 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2023-11-29 03:08:39,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3785293.3333333335, ans=0.125 2023-11-29 03:08:42,742 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.66 vs. limit=12.0 2023-11-29 03:08:43,359 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567800 2023-11-29 03:09:07,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3785426.6666666665, ans=0.0 2023-11-29 03:09:08,055 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.65 vs. limit=15.0 2023-11-29 03:09:15,739 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2700, loss[loss=0.05288, simple_loss=0.06827, pruned_loss=0.0101, audio_tagging_loss=0.008648, over 13788.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08862, pruned_loss=0.0117, audio_tagging_loss=0.008472, over 3054701.59 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:09:24,920 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:09:32,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3785560.0, ans=0.125 2023-11-29 03:09:37,227 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.88 vs. limit=15.0 2023-11-29 03:09:38,842 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 9.139e+01 9.804e+01 1.056e+02 1.449e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-29 03:09:46,615 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567850 2023-11-29 03:09:46,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3785626.6666666665, ans=0.125 2023-11-29 03:09:46,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3785626.6666666665, ans=0.2 2023-11-29 03:10:04,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3785760.0, ans=15.0 2023-11-29 03:10:15,540 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:10:17,684 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2750, loss[loss=0.07511, simple_loss=0.1004, pruned_loss=0.01439, audio_tagging_loss=0.01051, over 14398.00 frames. ], tot_loss[loss=0.06405, simple_loss=0.08801, pruned_loss=0.01159, audio_tagging_loss=0.008448, over 3049800.51 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:10:32,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3785893.3333333335, ans=0.125 2023-11-29 03:10:37,572 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2023-11-29 03:10:47,654 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567900 2023-11-29 03:10:50,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3785960.0, ans=0.0 2023-11-29 03:11:08,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3786093.3333333335, ans=0.0 2023-11-29 03:11:12,733 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:11:19,947 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2800, loss[loss=0.05678, simple_loss=0.07693, pruned_loss=0.008892, audio_tagging_loss=0.009428, over 15026.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08897, pruned_loss=0.01177, audio_tagging_loss=0.008408, over 3049607.48 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:11:37,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3786226.6666666665, ans=0.125 2023-11-29 03:11:43,662 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 9.033e+01 9.912e+01 1.050e+02 3.585e+02, threshold=1.982e+02, percent-clipped=1.0 2023-11-29 03:11:49,595 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567950 2023-11-29 03:12:05,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3786360.0, ans=0.125 2023-11-29 03:12:21,962 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2850, loss[loss=0.05902, simple_loss=0.07455, pruned_loss=0.01343, audio_tagging_loss=0.008326, over 14027.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08881, pruned_loss=0.01174, audio_tagging_loss=0.008431, over 3045610.82 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:12:25,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3786493.3333333335, ans=0.0 2023-11-29 03:12:33,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3786560.0, ans=0.125 2023-11-29 03:12:51,747 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568000 2023-11-29 03:12:53,215 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-568000.pt 2023-11-29 03:13:25,917 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2900, loss[loss=0.0893, simple_loss=0.116, pruned_loss=0.02353, audio_tagging_loss=0.007768, over 14472.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.0904, pruned_loss=0.01223, audio_tagging_loss=0.00843, over 3042696.46 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:13:41,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3786893.3333333335, ans=0.1 2023-11-29 03:13:51,311 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.068e+01 9.075e+01 9.599e+01 1.049e+02 1.799e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 03:13:56,120 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568050 2023-11-29 03:14:07,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3787026.6666666665, ans=0.2 2023-11-29 03:14:09,129 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.41 vs. limit=15.0 2023-11-29 03:14:25,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3787093.3333333335, ans=0.07 2023-11-29 03:14:28,237 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2950, loss[loss=0.06052, simple_loss=0.08169, pruned_loss=0.01258, audio_tagging_loss=0.007091, over 15361.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.0892, pruned_loss=0.01211, audio_tagging_loss=0.008488, over 3047501.29 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:14:54,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3787293.3333333335, ans=0.0 2023-11-29 03:14:57,829 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568100 2023-11-29 03:15:18,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3787426.6666666665, ans=0.1 2023-11-29 03:15:19,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3787426.6666666665, ans=0.0 2023-11-29 03:15:19,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3787426.6666666665, ans=0.125 2023-11-29 03:15:30,045 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3000, loss[loss=0.07162, simple_loss=0.1014, pruned_loss=0.014, audio_tagging_loss=0.006928, over 16015.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08977, pruned_loss=0.01214, audio_tagging_loss=0.008459, over 3048422.92 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:15:30,047 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-29 03:15:57,512 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.7838, 4.9262, 5.0834, 4.9249], device='cuda:0') 2023-11-29 03:16:11,316 INFO [train_asr.py:1267] (0/4) Epoch 48, validation: loss=0.05793, simple_loss=0.05039, pruned_loss=0.005256, audio_tagging_loss=0.02748, over 4681554.00 frames. 2023-11-29 03:16:11,317 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-29 03:16:29,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3787560.0, ans=0.0 2023-11-29 03:16:35,797 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 9.350e+01 9.749e+01 1.060e+02 2.355e+02, threshold=1.950e+02, percent-clipped=1.0 2023-11-29 03:16:41,310 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568150 2023-11-29 03:16:50,670 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=15.0 2023-11-29 03:16:58,842 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2023-11-29 03:17:13,322 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3050, loss[loss=0.06489, simple_loss=0.09031, pruned_loss=0.01276, audio_tagging_loss=0.006974, over 14352.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09054, pruned_loss=0.01229, audio_tagging_loss=0.008555, over 3045854.89 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:17:25,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3787893.3333333335, ans=0.2 2023-11-29 03:17:27,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3787893.3333333335, ans=0.04949747468305833 2023-11-29 03:17:27,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3787893.3333333335, ans=0.125 2023-11-29 03:17:28,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3787893.3333333335, ans=0.1 2023-11-29 03:17:38,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3787960.0, ans=0.125 2023-11-29 03:17:42,950 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568200 2023-11-29 03:17:47,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3787960.0, ans=0.09899494936611666 2023-11-29 03:17:51,366 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:17:56,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3788026.6666666665, ans=0.125 2023-11-29 03:18:06,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3788093.3333333335, ans=10.0 2023-11-29 03:18:09,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3788093.3333333335, ans=0.125 2023-11-29 03:18:15,752 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3100, loss[loss=0.06636, simple_loss=0.07905, pruned_loss=0.01323, audio_tagging_loss=0.01361, over 15421.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.0906, pruned_loss=0.01214, audio_tagging_loss=0.008577, over 3043874.87 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:18:39,845 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.502e+01 8.957e+01 9.617e+01 1.028e+02 1.274e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-29 03:18:41,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3788293.3333333335, ans=0.125 2023-11-29 03:18:45,213 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568250 2023-11-29 03:18:47,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3788293.3333333335, ans=0.0 2023-11-29 03:18:52,178 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=15.0 2023-11-29 03:19:05,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3788426.6666666665, ans=0.125 2023-11-29 03:19:10,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3788426.6666666665, ans=0.2 2023-11-29 03:19:14,110 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:19:16,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3788493.3333333335, ans=0.125 2023-11-29 03:19:17,461 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3150, loss[loss=0.06872, simple_loss=0.1035, pruned_loss=0.01048, audio_tagging_loss=0.006482, over 17226.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09163, pruned_loss=0.01246, audio_tagging_loss=0.008594, over 3036227.69 frames. ], batch size: 63, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:19:26,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3788493.3333333335, ans=0.1 2023-11-29 03:19:47,292 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568300 2023-11-29 03:19:50,272 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.96 vs. limit=15.0 2023-11-29 03:20:19,183 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3200, loss[loss=0.07451, simple_loss=0.1087, pruned_loss=0.01451, audio_tagging_loss=0.005676, over 16784.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09112, pruned_loss=0.01228, audio_tagging_loss=0.008649, over 3043585.62 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:20:21,365 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2023-11-29 03:20:37,765 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.89 vs. limit=15.0 2023-11-29 03:20:44,393 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 8.999e+01 9.702e+01 1.062e+02 1.415e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 03:20:49,401 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568350 2023-11-29 03:20:49,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3788960.0, ans=0.125 2023-11-29 03:20:58,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3789026.6666666665, ans=0.125 2023-11-29 03:21:21,258 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3250, loss[loss=0.06241, simple_loss=0.08959, pruned_loss=0.009807, audio_tagging_loss=0.00781, over 15782.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09103, pruned_loss=0.01233, audio_tagging_loss=0.008685, over 3042663.34 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:21:37,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3789226.6666666665, ans=0.0 2023-11-29 03:21:38,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3789226.6666666665, ans=0.125 2023-11-29 03:21:39,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3789226.6666666665, ans=0.125 2023-11-29 03:21:50,719 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568400 2023-11-29 03:21:57,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3789293.3333333335, ans=0.125 2023-11-29 03:22:07,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3789360.0, ans=0.0 2023-11-29 03:22:13,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3789426.6666666665, ans=0.125 2023-11-29 03:22:24,189 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3300, loss[loss=0.076, simple_loss=0.1058, pruned_loss=0.0137, audio_tagging_loss=0.009415, over 15765.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09105, pruned_loss=0.01236, audio_tagging_loss=0.008765, over 3038040.48 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:22:33,051 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.75 vs. limit=15.0 2023-11-29 03:22:39,032 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.91 vs. limit=15.0 2023-11-29 03:22:48,846 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 9.011e+01 9.553e+01 1.025e+02 1.344e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-29 03:22:53,568 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568450 2023-11-29 03:22:54,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3789626.6666666665, ans=0.125 2023-11-29 03:23:08,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3789693.3333333335, ans=0.125 2023-11-29 03:23:10,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3789693.3333333335, ans=0.125 2023-11-29 03:23:13,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3789760.0, ans=0.125 2023-11-29 03:23:18,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3789760.0, ans=0.125 2023-11-29 03:23:19,764 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.32 vs. limit=15.0 2023-11-29 03:23:25,036 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3350, loss[loss=0.06561, simple_loss=0.09919, pruned_loss=0.01082, audio_tagging_loss=0.005195, over 14728.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09099, pruned_loss=0.01223, audio_tagging_loss=0.008738, over 3043534.68 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:23:31,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3789826.6666666665, ans=0.125 2023-11-29 03:23:55,264 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568500 2023-11-29 03:24:03,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3790026.6666666665, ans=0.125 2023-11-29 03:24:09,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3790026.6666666665, ans=0.2 2023-11-29 03:24:26,773 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3400, loss[loss=0.06513, simple_loss=0.07559, pruned_loss=0.01882, audio_tagging_loss=0.008514, over 14807.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09116, pruned_loss=0.0122, audio_tagging_loss=0.008639, over 3045588.32 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:24:51,488 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.195e+01 9.061e+01 9.646e+01 1.021e+02 1.209e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 03:24:56,216 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568550 2023-11-29 03:24:56,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3790293.3333333335, ans=0.2 2023-11-29 03:25:16,680 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2023-11-29 03:25:19,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3790426.6666666665, ans=0.2 2023-11-29 03:25:28,288 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3450, loss[loss=0.09094, simple_loss=0.1266, pruned_loss=0.02136, audio_tagging_loss=0.00627, over 16326.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09176, pruned_loss=0.01226, audio_tagging_loss=0.008478, over 3046849.77 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:25:32,703 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:25:50,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3790560.0, ans=0.125 2023-11-29 03:25:55,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3790626.6666666665, ans=0.0 2023-11-29 03:25:57,934 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.40 vs. limit=22.5 2023-11-29 03:25:58,454 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568600 2023-11-29 03:25:58,989 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.47 vs. limit=6.0 2023-11-29 03:26:07,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3790693.3333333335, ans=0.125 2023-11-29 03:26:20,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3790760.0, ans=0.1 2023-11-29 03:26:25,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3790760.0, ans=0.0 2023-11-29 03:26:30,422 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3500, loss[loss=0.06942, simple_loss=0.08634, pruned_loss=0.01508, audio_tagging_loss=0.01118, over 15839.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.09049, pruned_loss=0.01203, audio_tagging_loss=0.008453, over 3053067.22 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:26:54,791 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 8.797e+01 9.462e+01 1.015e+02 1.238e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-29 03:27:00,111 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568650 2023-11-29 03:27:04,681 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:27:14,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3791026.6666666665, ans=0.2 2023-11-29 03:27:18,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3791093.3333333335, ans=0.0 2023-11-29 03:27:29,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3791093.3333333335, ans=0.125 2023-11-29 03:27:32,447 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3550, loss[loss=0.07673, simple_loss=0.1137, pruned_loss=0.01352, audio_tagging_loss=0.006372, over 14407.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08956, pruned_loss=0.01199, audio_tagging_loss=0.008468, over 3051700.07 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:27:49,989 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.01 vs. limit=22.5 2023-11-29 03:28:01,906 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568700 2023-11-29 03:28:02,028 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:28:31,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3791426.6666666665, ans=0.125 2023-11-29 03:28:34,002 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3600, loss[loss=0.07176, simple_loss=0.1104, pruned_loss=0.01011, audio_tagging_loss=0.006455, over 14695.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08916, pruned_loss=0.01187, audio_tagging_loss=0.008441, over 3046763.05 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:28:59,348 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.871e+01 9.571e+01 1.037e+02 1.255e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 03:29:04,032 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568750 2023-11-29 03:29:07,683 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:29:20,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3791693.3333333335, ans=0.2 2023-11-29 03:29:25,988 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:29:34,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3791826.6666666665, ans=0.125 2023-11-29 03:29:35,717 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3650, loss[loss=0.05852, simple_loss=0.07613, pruned_loss=0.01371, audio_tagging_loss=0.006752, over 13259.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.0882, pruned_loss=0.0117, audio_tagging_loss=0.008478, over 3047125.58 frames. ], batch size: 51, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:29:54,339 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.41 vs. limit=12.0 2023-11-29 03:29:55,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3791893.3333333335, ans=0.0 2023-11-29 03:29:59,080 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.48 vs. limit=22.5 2023-11-29 03:30:03,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3791960.0, ans=0.125 2023-11-29 03:30:05,593 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568800 2023-11-29 03:30:05,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3791960.0, ans=0.2 2023-11-29 03:30:19,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3792026.6666666665, ans=0.125 2023-11-29 03:30:36,151 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=15.0 2023-11-29 03:30:36,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3792160.0, ans=0.0 2023-11-29 03:30:37,640 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3700, loss[loss=0.05565, simple_loss=0.07871, pruned_loss=0.008588, audio_tagging_loss=0.007706, over 15631.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08823, pruned_loss=0.01179, audio_tagging_loss=0.008569, over 3047532.67 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:30:37,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3792160.0, ans=0.2 2023-11-29 03:30:40,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3792160.0, ans=0.1 2023-11-29 03:30:47,543 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:30:57,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3792226.6666666665, ans=0.025 2023-11-29 03:31:03,485 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 9.169e+01 9.957e+01 1.078e+02 1.355e+02, threshold=1.991e+02, percent-clipped=0.0 2023-11-29 03:31:07,295 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568850 2023-11-29 03:31:29,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3792426.6666666665, ans=0.125 2023-11-29 03:31:40,466 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3750, loss[loss=0.07519, simple_loss=0.1012, pruned_loss=0.01541, audio_tagging_loss=0.009181, over 14651.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08836, pruned_loss=0.01191, audio_tagging_loss=0.008699, over 3042163.09 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:31:55,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3792560.0, ans=0.0 2023-11-29 03:31:59,107 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:32:11,224 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568900 2023-11-29 03:32:24,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3792693.3333333335, ans=0.1 2023-11-29 03:32:25,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3792693.3333333335, ans=0.125 2023-11-29 03:32:25,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.15 vs. limit=22.5 2023-11-29 03:32:26,251 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:32:31,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3792760.0, ans=0.0 2023-11-29 03:32:35,999 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.17 vs. limit=22.5 2023-11-29 03:32:36,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3792760.0, ans=0.025 2023-11-29 03:32:42,224 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3800, loss[loss=0.07033, simple_loss=0.09481, pruned_loss=0.0126, audio_tagging_loss=0.01032, over 15723.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08897, pruned_loss=0.01202, audio_tagging_loss=0.008742, over 3042479.73 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:32:54,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3792893.3333333335, ans=0.2 2023-11-29 03:33:08,153 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 9.049e+01 9.763e+01 1.085e+02 1.488e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-29 03:33:12,035 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568950 2023-11-29 03:33:15,578 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:33:16,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3792960.0, ans=0.125 2023-11-29 03:33:18,840 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2023-11-29 03:33:21,295 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.81 vs. limit=22.5 2023-11-29 03:33:26,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3793026.6666666665, ans=0.1 2023-11-29 03:33:35,918 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.50 vs. limit=15.0 2023-11-29 03:33:42,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3793093.3333333335, ans=0.0 2023-11-29 03:33:44,644 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3850, loss[loss=0.06197, simple_loss=0.09267, pruned_loss=0.008253, audio_tagging_loss=0.007386, over 14266.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08895, pruned_loss=0.01204, audio_tagging_loss=0.008709, over 3042438.79 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:33:50,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3793160.0, ans=0.0 2023-11-29 03:34:03,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3793226.6666666665, ans=0.5 2023-11-29 03:34:08,709 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:34:13,342 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569000 2023-11-29 03:34:13,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3793293.3333333335, ans=0.125 2023-11-29 03:34:25,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3793360.0, ans=0.2 2023-11-29 03:34:42,301 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-29 03:34:45,154 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3900, loss[loss=0.06903, simple_loss=0.09933, pruned_loss=0.01071, audio_tagging_loss=0.008655, over 15833.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08855, pruned_loss=0.01194, audio_tagging_loss=0.008793, over 3035849.50 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:35:10,782 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.633e+01 9.044e+01 9.626e+01 1.053e+02 1.477e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-29 03:35:15,675 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569050 2023-11-29 03:35:22,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3793693.3333333335, ans=0.125 2023-11-29 03:35:36,144 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.67 vs. limit=12.0 2023-11-29 03:35:46,729 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3950, loss[loss=0.05649, simple_loss=0.07113, pruned_loss=0.007908, audio_tagging_loss=0.01302, over 14557.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08814, pruned_loss=0.01181, audio_tagging_loss=0.008821, over 3031898.71 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:36:16,290 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569100 2023-11-29 03:36:26,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3794026.6666666665, ans=0.0 2023-11-29 03:36:48,507 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4000, loss[loss=0.07336, simple_loss=0.102, pruned_loss=0.01353, audio_tagging_loss=0.008816, over 15125.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08876, pruned_loss=0.01194, audio_tagging_loss=0.008883, over 3035028.86 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:36:50,817 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.78 vs. limit=15.0 2023-11-29 03:36:51,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3794160.0, ans=0.0 2023-11-29 03:37:14,528 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.940e+01 9.124e+01 9.854e+01 1.064e+02 1.398e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-29 03:37:18,376 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569150 2023-11-29 03:37:18,650 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:37:23,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3794293.3333333335, ans=0.125 2023-11-29 03:37:40,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3794426.6666666665, ans=0.125 2023-11-29 03:37:49,706 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4050, loss[loss=0.08512, simple_loss=0.1243, pruned_loss=0.01581, audio_tagging_loss=0.007145, over 15429.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08928, pruned_loss=0.01184, audio_tagging_loss=0.00885, over 3036026.73 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:37:55,516 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:38:00,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3794493.3333333335, ans=0.1 2023-11-29 03:38:07,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3794560.0, ans=0.125 2023-11-29 03:38:11,871 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.69 vs. limit=10.0 2023-11-29 03:38:18,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3794626.6666666665, ans=0.07 2023-11-29 03:38:19,713 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569200 2023-11-29 03:38:22,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3794626.6666666665, ans=0.125 2023-11-29 03:38:24,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=22.5 2023-11-29 03:38:40,006 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=15.0 2023-11-29 03:38:46,590 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2023-11-29 03:38:51,652 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4100, loss[loss=0.06052, simple_loss=0.08255, pruned_loss=0.008618, audio_tagging_loss=0.01063, over 13906.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08944, pruned_loss=0.01192, audio_tagging_loss=0.008846, over 3037462.71 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:39:15,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3794893.3333333335, ans=0.025 2023-11-29 03:39:19,472 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 9.014e+01 9.699e+01 1.029e+02 1.254e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 03:39:21,214 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.30 vs. limit=10.0 2023-11-29 03:39:21,930 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569250 2023-11-29 03:39:26,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3794960.0, ans=0.125 2023-11-29 03:39:34,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3795026.6666666665, ans=0.0 2023-11-29 03:39:40,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3795093.3333333335, ans=0.125 2023-11-29 03:39:51,291 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.61 vs. limit=15.0 2023-11-29 03:39:53,508 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4150, loss[loss=0.04823, simple_loss=0.07222, pruned_loss=0.006856, audio_tagging_loss=0.005265, over 14690.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08897, pruned_loss=0.01178, audio_tagging_loss=0.008696, over 3038565.92 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:40:00,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3795160.0, ans=0.0 2023-11-29 03:40:10,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.20 vs. limit=22.5 2023-11-29 03:40:22,932 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569300 2023-11-29 03:40:24,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3795293.3333333335, ans=0.125 2023-11-29 03:40:26,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3795293.3333333335, ans=0.2 2023-11-29 03:40:36,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3795360.0, ans=0.0 2023-11-29 03:40:41,518 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:40:46,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3795426.6666666665, ans=0.125 2023-11-29 03:40:48,049 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2023-11-29 03:40:54,938 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4200, loss[loss=0.0703, simple_loss=0.1055, pruned_loss=0.01017, audio_tagging_loss=0.007364, over 15573.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08992, pruned_loss=0.01196, audio_tagging_loss=0.008603, over 3047218.79 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:40:55,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3795493.3333333335, ans=0.0 2023-11-29 03:40:58,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3795493.3333333335, ans=0.05 2023-11-29 03:41:00,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3795493.3333333335, ans=0.0 2023-11-29 03:41:04,943 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.85 vs. limit=22.5 2023-11-29 03:41:11,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3795560.0, ans=0.125 2023-11-29 03:41:21,624 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 9.132e+01 9.847e+01 1.051e+02 1.276e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-29 03:41:21,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3795626.6666666665, ans=0.1 2023-11-29 03:41:23,947 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569350 2023-11-29 03:41:26,183 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2023-11-29 03:41:29,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3795626.6666666665, ans=0.125 2023-11-29 03:41:38,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3795693.3333333335, ans=0.125 2023-11-29 03:41:44,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3795760.0, ans=0.0 2023-11-29 03:41:50,128 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.19 vs. limit=15.0 2023-11-29 03:41:53,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3795760.0, ans=0.035 2023-11-29 03:41:56,097 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4250, loss[loss=0.05639, simple_loss=0.07424, pruned_loss=0.009329, audio_tagging_loss=0.009944, over 15416.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08928, pruned_loss=0.01191, audio_tagging_loss=0.008517, over 3049735.77 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:42:10,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3795893.3333333335, ans=0.125 2023-11-29 03:42:14,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.95 vs. limit=22.5 2023-11-29 03:42:25,048 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569400 2023-11-29 03:42:36,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3796026.6666666665, ans=0.5 2023-11-29 03:42:38,460 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.61 vs. limit=15.0 2023-11-29 03:42:45,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3796093.3333333335, ans=0.125 2023-11-29 03:42:46,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3796093.3333333335, ans=0.025 2023-11-29 03:42:49,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3796093.3333333335, ans=0.04949747468305833 2023-11-29 03:42:57,169 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4300, loss[loss=0.06544, simple_loss=0.0963, pruned_loss=0.01052, audio_tagging_loss=0.006768, over 15506.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.0906, pruned_loss=0.01227, audio_tagging_loss=0.008377, over 3050655.27 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:43:07,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3796160.0, ans=0.1 2023-11-29 03:43:09,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3796226.6666666665, ans=0.125 2023-11-29 03:43:20,891 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:43:23,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3796293.3333333335, ans=0.1 2023-11-29 03:43:24,067 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 9.187e+01 9.912e+01 1.060e+02 1.366e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-29 03:43:27,289 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569450 2023-11-29 03:43:27,722 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2023-11-29 03:43:40,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3796360.0, ans=0.025 2023-11-29 03:43:41,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3796360.0, ans=0.1 2023-11-29 03:43:58,134 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4350, loss[loss=0.0673, simple_loss=0.08849, pruned_loss=0.01195, audio_tagging_loss=0.01111, over 14472.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.09071, pruned_loss=0.01219, audio_tagging_loss=0.008405, over 3046572.42 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:44:23,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3796626.6666666665, ans=0.0 2023-11-29 03:44:24,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3796626.6666666665, ans=0.0 2023-11-29 03:44:27,883 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569500 2023-11-29 03:44:32,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3796626.6666666665, ans=0.125 2023-11-29 03:44:34,918 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2023-11-29 03:44:38,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3796693.3333333335, ans=0.125 2023-11-29 03:44:48,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3796760.0, ans=0.5 2023-11-29 03:44:56,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3796760.0, ans=0.0 2023-11-29 03:45:00,074 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4400, loss[loss=0.07864, simple_loss=0.1025, pruned_loss=0.01929, audio_tagging_loss=0.008116, over 14936.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09063, pruned_loss=0.01223, audio_tagging_loss=0.008442, over 3045137.62 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:45:26,454 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.800e+01 8.869e+01 9.573e+01 1.013e+02 1.408e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-29 03:45:28,855 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569550 2023-11-29 03:45:35,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3797026.6666666665, ans=0.125 2023-11-29 03:45:40,710 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.29 vs. limit=12.0 2023-11-29 03:45:57,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3797093.3333333335, ans=0.0 2023-11-29 03:46:00,732 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4450, loss[loss=0.06559, simple_loss=0.08757, pruned_loss=0.01224, audio_tagging_loss=0.009574, over 15024.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09086, pruned_loss=0.01224, audio_tagging_loss=0.008369, over 3049206.30 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:46:01,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3797160.0, ans=0.125 2023-11-29 03:46:21,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3797226.6666666665, ans=0.0 2023-11-29 03:46:30,105 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569600 2023-11-29 03:46:58,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3797426.6666666665, ans=0.125 2023-11-29 03:47:02,127 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4500, loss[loss=0.05805, simple_loss=0.0814, pruned_loss=0.009506, audio_tagging_loss=0.007846, over 14751.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08983, pruned_loss=0.01202, audio_tagging_loss=0.008418, over 3046792.45 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:47:03,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.56 vs. limit=12.0 2023-11-29 03:47:29,053 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.852e+01 8.934e+01 9.508e+01 1.012e+02 1.257e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-29 03:47:31,561 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569650 2023-11-29 03:47:49,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3797693.3333333335, ans=0.125 2023-11-29 03:47:49,530 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=22.5 2023-11-29 03:48:02,619 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4550, loss[loss=0.06826, simple_loss=0.09174, pruned_loss=0.01383, audio_tagging_loss=0.008558, over 14136.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08945, pruned_loss=0.01196, audio_tagging_loss=0.008467, over 3044650.39 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:48:08,092 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.35 vs. limit=15.0 2023-11-29 03:48:12,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3797826.6666666665, ans=0.125 2023-11-29 03:48:12,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3797826.6666666665, ans=0.2 2023-11-29 03:48:18,664 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.06 vs. limit=22.5 2023-11-29 03:48:32,973 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569700 2023-11-29 03:48:33,377 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2023-11-29 03:48:35,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3797960.0, ans=0.1 2023-11-29 03:48:45,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3798026.6666666665, ans=0.0 2023-11-29 03:48:48,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3798026.6666666665, ans=0.2 2023-11-29 03:48:53,617 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:49:03,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3798160.0, ans=0.0 2023-11-29 03:49:04,245 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4600, loss[loss=0.07967, simple_loss=0.1164, pruned_loss=0.01363, audio_tagging_loss=0.007827, over 16111.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08917, pruned_loss=0.01186, audio_tagging_loss=0.008606, over 3039694.44 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:49:06,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3798160.0, ans=0.125 2023-11-29 03:49:28,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3798293.3333333335, ans=0.125 2023-11-29 03:49:30,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3798293.3333333335, ans=0.125 2023-11-29 03:49:30,968 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.879e+01 8.831e+01 9.354e+01 1.006e+02 1.240e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-29 03:49:33,417 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569750 2023-11-29 03:50:05,629 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4650, loss[loss=0.04725, simple_loss=0.06327, pruned_loss=0.004894, audio_tagging_loss=0.01072, over 15369.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08907, pruned_loss=0.01188, audio_tagging_loss=0.008646, over 3042464.67 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:50:07,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.35 vs. limit=10.0 2023-11-29 03:50:13,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3798493.3333333335, ans=0.0 2023-11-29 03:50:14,227 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.00 vs. limit=15.0 2023-11-29 03:50:24,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3798560.0, ans=0.125 2023-11-29 03:50:27,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3798560.0, ans=0.0 2023-11-29 03:50:28,458 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.17 vs. limit=10.0 2023-11-29 03:50:32,028 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2023-11-29 03:50:32,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3798626.6666666665, ans=0.0 2023-11-29 03:50:34,941 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569800 2023-11-29 03:51:06,091 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4700, loss[loss=0.06958, simple_loss=0.0968, pruned_loss=0.0124, audio_tagging_loss=0.008782, over 14908.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08887, pruned_loss=0.01203, audio_tagging_loss=0.00876, over 3040337.15 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:51:09,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3798826.6666666665, ans=0.125 2023-11-29 03:51:19,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3798893.3333333335, ans=0.025 2023-11-29 03:51:21,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3798893.3333333335, ans=0.0 2023-11-29 03:51:21,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3798893.3333333335, ans=10.0 2023-11-29 03:51:33,853 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.280e+01 9.105e+01 9.941e+01 1.052e+02 1.267e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-29 03:51:36,243 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569850 2023-11-29 03:52:08,213 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4750, loss[loss=0.05374, simple_loss=0.07048, pruned_loss=0.008933, audio_tagging_loss=0.009572, over 14085.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08879, pruned_loss=0.01214, audio_tagging_loss=0.00882, over 3042760.01 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:52:18,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3799160.0, ans=0.0 2023-11-29 03:52:18,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3799160.0, ans=0.125 2023-11-29 03:52:20,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3799226.6666666665, ans=0.125 2023-11-29 03:52:28,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3799226.6666666665, ans=10.0 2023-11-29 03:52:36,652 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569900 2023-11-29 03:52:39,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3799293.3333333335, ans=0.09899494936611666 2023-11-29 03:52:46,645 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.12 vs. limit=15.0 2023-11-29 03:52:48,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3799360.0, ans=0.2 2023-11-29 03:52:49,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3799360.0, ans=0.2 2023-11-29 03:53:03,054 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2023-11-29 03:53:09,926 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4800, loss[loss=0.06363, simple_loss=0.08772, pruned_loss=0.01193, audio_tagging_loss=0.007837, over 14733.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08882, pruned_loss=0.01201, audio_tagging_loss=0.008839, over 3043921.02 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:53:12,988 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2023-11-29 03:53:22,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3799560.0, ans=0.125 2023-11-29 03:53:37,278 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.583e+01 8.976e+01 9.656e+01 1.035e+02 1.213e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-29 03:53:38,654 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569950 2023-11-29 03:53:59,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3799760.0, ans=0.125 2023-11-29 03:54:11,307 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4850, loss[loss=0.08214, simple_loss=0.1048, pruned_loss=0.02175, audio_tagging_loss=0.007987, over 15082.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08929, pruned_loss=0.01196, audio_tagging_loss=0.00886, over 3043668.81 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:54:17,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3799826.6666666665, ans=0.125 2023-11-29 03:54:19,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3799826.6666666665, ans=0.125 2023-11-29 03:54:35,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3799960.0, ans=0.125 2023-11-29 03:54:42,049 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570000 2023-11-29 03:54:44,151 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.60 vs. limit=6.0 2023-11-29 03:55:00,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3800093.3333333335, ans=0.125 2023-11-29 03:55:07,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3800093.3333333335, ans=0.0 2023-11-29 03:55:13,237 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4900, loss[loss=0.04858, simple_loss=0.05975, pruned_loss=0.007991, audio_tagging_loss=0.01071, over 15259.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08944, pruned_loss=0.01194, audio_tagging_loss=0.008667, over 3046937.71 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:55:38,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3800293.3333333335, ans=0.125 2023-11-29 03:55:43,186 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 8.986e+01 9.622e+01 1.028e+02 1.398e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 03:55:44,484 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570050 2023-11-29 03:55:59,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3800360.0, ans=0.0 2023-11-29 03:56:18,010 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4950, loss[loss=0.07145, simple_loss=0.09514, pruned_loss=0.017, audio_tagging_loss=0.006872, over 14391.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08967, pruned_loss=0.012, audio_tagging_loss=0.008617, over 3050703.45 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:56:21,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3800493.3333333335, ans=0.125 2023-11-29 03:56:26,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3800493.3333333335, ans=0.025 2023-11-29 03:56:47,124 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570100 2023-11-29 03:56:53,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3800693.3333333335, ans=0.125 2023-11-29 03:56:56,453 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.20 vs. limit=10.0 2023-11-29 03:56:58,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3800693.3333333335, ans=0.125 2023-11-29 03:57:19,436 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5000, loss[loss=0.07373, simple_loss=0.1056, pruned_loss=0.01097, audio_tagging_loss=0.009947, over 15726.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08924, pruned_loss=0.01196, audio_tagging_loss=0.008607, over 3052564.95 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:57:19,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3800826.6666666665, ans=0.0 2023-11-29 03:57:27,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3800826.6666666665, ans=0.0 2023-11-29 03:57:34,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3800893.3333333335, ans=0.125 2023-11-29 03:57:39,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2023-11-29 03:57:43,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3800960.0, ans=0.2 2023-11-29 03:57:48,754 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.151e+01 8.990e+01 9.495e+01 1.030e+02 1.330e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-29 03:57:50,012 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570150 2023-11-29 03:58:13,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3801093.3333333335, ans=10.0 2023-11-29 03:58:21,207 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5050, loss[loss=0.07845, simple_loss=0.1046, pruned_loss=0.01631, audio_tagging_loss=0.009822, over 15174.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08882, pruned_loss=0.01198, audio_tagging_loss=0.008564, over 3054342.91 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:58:25,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3801160.0, ans=0.125 2023-11-29 03:58:37,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3801226.6666666665, ans=0.1 2023-11-29 03:58:42,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3801226.6666666665, ans=0.125 2023-11-29 03:58:48,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3801293.3333333335, ans=0.2 2023-11-29 03:58:48,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3801293.3333333335, ans=0.0 2023-11-29 03:58:50,508 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570200 2023-11-29 03:58:54,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3801293.3333333335, ans=0.0 2023-11-29 03:59:01,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3801360.0, ans=0.125 2023-11-29 03:59:16,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3801426.6666666665, ans=0.025 2023-11-29 03:59:22,892 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5100, loss[loss=0.0714, simple_loss=0.09188, pruned_loss=0.01577, audio_tagging_loss=0.009685, over 15287.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08908, pruned_loss=0.0121, audio_tagging_loss=0.008499, over 3054024.34 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:59:32,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3801493.3333333335, ans=0.2 2023-11-29 03:59:43,630 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.03 vs. limit=10.0 2023-11-29 03:59:50,406 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 9.192e+01 9.667e+01 1.067e+02 2.138e+02, threshold=1.933e+02, percent-clipped=1.0 2023-11-29 03:59:51,747 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570250 2023-11-29 03:59:57,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3801693.3333333335, ans=0.2 2023-11-29 04:00:05,590 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.61 vs. limit=15.0 2023-11-29 04:00:16,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.57 vs. limit=15.0 2023-11-29 04:00:19,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3801760.0, ans=0.1 2023-11-29 04:00:23,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3801826.6666666665, ans=0.1 2023-11-29 04:00:24,154 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5150, loss[loss=0.07743, simple_loss=0.1007, pruned_loss=0.01978, audio_tagging_loss=0.007324, over 15548.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08926, pruned_loss=0.01219, audio_tagging_loss=0.008549, over 3061285.44 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:00:26,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3801826.6666666665, ans=0.125 2023-11-29 04:00:32,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3801826.6666666665, ans=0.0 2023-11-29 04:00:53,257 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570300 2023-11-29 04:01:03,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3802026.6666666665, ans=0.125 2023-11-29 04:01:12,794 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.82 vs. limit=15.0 2023-11-29 04:01:25,321 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5200, loss[loss=0.0671, simple_loss=0.09278, pruned_loss=0.01393, audio_tagging_loss=0.006788, over 15109.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09006, pruned_loss=0.01225, audio_tagging_loss=0.008533, over 3059962.13 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 04:01:41,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3802226.6666666665, ans=0.125 2023-11-29 04:01:52,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3802293.3333333335, ans=0.125 2023-11-29 04:01:54,017 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.656e+01 9.016e+01 9.699e+01 1.038e+02 1.418e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 04:01:55,316 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570350 2023-11-29 04:02:26,905 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5250, loss[loss=0.0531, simple_loss=0.07667, pruned_loss=0.004857, audio_tagging_loss=0.009906, over 16032.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08973, pruned_loss=0.0122, audio_tagging_loss=0.00857, over 3059233.60 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 04:02:28,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3802493.3333333335, ans=0.125 2023-11-29 04:02:28,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3802493.3333333335, ans=0.1 2023-11-29 04:02:33,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3802493.3333333335, ans=0.2 2023-11-29 04:02:42,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3802560.0, ans=0.1 2023-11-29 04:02:44,915 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.64 vs. limit=22.5 2023-11-29 04:02:56,077 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570400 2023-11-29 04:03:10,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3802693.3333333335, ans=0.125 2023-11-29 04:03:12,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3802693.3333333335, ans=0.2 2023-11-29 04:03:26,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3802760.0, ans=0.025 2023-11-29 04:03:28,965 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5300, loss[loss=0.05346, simple_loss=0.07669, pruned_loss=0.007366, audio_tagging_loss=0.007752, over 16015.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08992, pruned_loss=0.0122, audio_tagging_loss=0.008534, over 3059071.68 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 04:03:36,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3802826.6666666665, ans=0.035 2023-11-29 04:03:48,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3802893.3333333335, ans=0.2 2023-11-29 04:03:57,734 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.209e+01 9.143e+01 9.635e+01 1.038e+02 1.334e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-29 04:03:57,872 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570450 2023-11-29 04:04:12,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3803026.6666666665, ans=0.125 2023-11-29 04:04:29,666 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5350, loss[loss=0.04928, simple_loss=0.06738, pruned_loss=0.006743, audio_tagging_loss=0.008845, over 14620.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08971, pruned_loss=0.01212, audio_tagging_loss=0.008528, over 3052360.80 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:04:31,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3803160.0, ans=0.2 2023-11-29 04:04:43,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3803226.6666666665, ans=0.125 2023-11-29 04:05:00,588 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570500 2023-11-29 04:05:08,467 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.11 vs. limit=10.0 2023-11-29 04:05:10,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3803360.0, ans=0.0 2023-11-29 04:05:23,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3803426.6666666665, ans=0.05 2023-11-29 04:05:31,591 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5400, loss[loss=0.0743, simple_loss=0.09932, pruned_loss=0.01529, audio_tagging_loss=0.00936, over 14714.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08947, pruned_loss=0.01202, audio_tagging_loss=0.008536, over 3048550.95 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:05:43,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3803560.0, ans=0.125 2023-11-29 04:06:01,171 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 9.108e+01 9.705e+01 1.034e+02 1.334e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-29 04:06:01,302 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570550 2023-11-29 04:06:01,869 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.29 vs. limit=22.5 2023-11-29 04:06:05,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3803626.6666666665, ans=0.125 2023-11-29 04:06:27,202 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.12 vs. limit=22.5 2023-11-29 04:06:29,693 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.28 vs. limit=15.0 2023-11-29 04:06:33,395 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5450, loss[loss=0.06904, simple_loss=0.09278, pruned_loss=0.01435, audio_tagging_loss=0.008302, over 14669.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08983, pruned_loss=0.01211, audio_tagging_loss=0.008606, over 3043552.95 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:06:57,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3803960.0, ans=0.125 2023-11-29 04:07:03,394 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570600 2023-11-29 04:07:23,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3804093.3333333335, ans=0.125 2023-11-29 04:07:28,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3804093.3333333335, ans=0.0 2023-11-29 04:07:35,644 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5500, loss[loss=0.06611, simple_loss=0.08861, pruned_loss=0.01395, audio_tagging_loss=0.007858, over 14828.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.09, pruned_loss=0.01221, audio_tagging_loss=0.008532, over 3053460.18 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:07:55,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3804226.6666666665, ans=0.125 2023-11-29 04:08:05,678 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570650 2023-11-29 04:08:06,700 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.860e+01 9.091e+01 9.676e+01 1.052e+02 2.081e+02, threshold=1.935e+02, percent-clipped=1.0 2023-11-29 04:08:10,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3804293.3333333335, ans=0.2 2023-11-29 04:08:25,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3804426.6666666665, ans=0.125 2023-11-29 04:08:29,720 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.99 vs. limit=15.0 2023-11-29 04:08:37,457 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5550, loss[loss=0.06566, simple_loss=0.09403, pruned_loss=0.01013, audio_tagging_loss=0.00851, over 15548.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.0896, pruned_loss=0.01197, audio_tagging_loss=0.008567, over 3057781.35 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 04:09:00,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3804626.6666666665, ans=0.125 2023-11-29 04:09:07,086 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570700 2023-11-29 04:09:10,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3804626.6666666665, ans=0.125 2023-11-29 04:09:36,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3804760.0, ans=0.0 2023-11-29 04:09:37,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3804760.0, ans=0.025 2023-11-29 04:09:39,325 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5600, loss[loss=0.07735, simple_loss=0.1022, pruned_loss=0.01727, audio_tagging_loss=0.009009, over 14759.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08958, pruned_loss=0.01194, audio_tagging_loss=0.008688, over 3054734.04 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:09:45,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3804826.6666666665, ans=0.125 2023-11-29 04:09:45,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3804826.6666666665, ans=0.09899494936611666 2023-11-29 04:09:51,059 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2023-11-29 04:10:03,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3804960.0, ans=0.125 2023-11-29 04:10:08,920 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570750 2023-11-29 04:10:09,946 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 9.028e+01 9.748e+01 1.040e+02 1.265e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-29 04:10:26,156 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 04:10:30,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3805093.3333333335, ans=0.125 2023-11-29 04:10:35,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3805093.3333333335, ans=0.125 2023-11-29 04:10:35,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3805093.3333333335, ans=0.09899494936611666 2023-11-29 04:10:37,584 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.35 vs. limit=15.0 2023-11-29 04:10:40,955 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5650, loss[loss=0.07419, simple_loss=0.09919, pruned_loss=0.01593, audio_tagging_loss=0.008663, over 16044.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09015, pruned_loss=0.01205, audio_tagging_loss=0.008749, over 3058892.20 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:11:10,758 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570800 2023-11-29 04:11:14,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3805293.3333333335, ans=0.125 2023-11-29 04:11:42,467 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5700, loss[loss=0.06586, simple_loss=0.08541, pruned_loss=0.01456, audio_tagging_loss=0.008599, over 14711.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.09001, pruned_loss=0.01196, audio_tagging_loss=0.0087, over 3054343.43 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:11:42,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3805493.3333333335, ans=0.2 2023-11-29 04:12:00,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3805560.0, ans=0.125 2023-11-29 04:12:07,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3805626.6666666665, ans=0.125 2023-11-29 04:12:12,020 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570850 2023-11-29 04:12:12,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3805626.6666666665, ans=0.125 2023-11-29 04:12:13,075 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.783e+01 9.102e+01 9.721e+01 1.096e+02 1.374e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-29 04:12:38,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3805760.0, ans=0.0 2023-11-29 04:12:44,451 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5750, loss[loss=0.05139, simple_loss=0.07497, pruned_loss=0.007196, audio_tagging_loss=0.006707, over 15037.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08876, pruned_loss=0.01183, audio_tagging_loss=0.008714, over 3049160.19 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:12:44,771 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:12:50,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3805826.6666666665, ans=0.0 2023-11-29 04:12:50,950 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.34 vs. limit=15.0 2023-11-29 04:13:01,645 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.44 vs. limit=15.0 2023-11-29 04:13:09,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3805960.0, ans=0.1 2023-11-29 04:13:13,080 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570900 2023-11-29 04:13:25,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3806026.6666666665, ans=0.0 2023-11-29 04:13:37,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3806093.3333333335, ans=0.0 2023-11-29 04:13:43,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3806160.0, ans=0.1 2023-11-29 04:13:44,388 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5800, loss[loss=0.06023, simple_loss=0.08196, pruned_loss=0.01083, audio_tagging_loss=0.008422, over 14451.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08874, pruned_loss=0.01195, audio_tagging_loss=0.008628, over 3043129.84 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:13:44,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3806160.0, ans=0.1 2023-11-29 04:13:48,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3806160.0, ans=0.125 2023-11-29 04:13:48,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3806160.0, ans=0.2 2023-11-29 04:14:07,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3806226.6666666665, ans=0.0 2023-11-29 04:14:13,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3806293.3333333335, ans=0.125 2023-11-29 04:14:14,979 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570950 2023-11-29 04:14:15,923 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.950e+01 9.520e+01 1.017e+02 1.550e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-29 04:14:20,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3806360.0, ans=0.0 2023-11-29 04:14:24,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3806360.0, ans=0.0 2023-11-29 04:14:31,844 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2023-11-29 04:14:32,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3806426.6666666665, ans=0.0 2023-11-29 04:14:32,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3806426.6666666665, ans=0.05 2023-11-29 04:14:37,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3806426.6666666665, ans=0.0 2023-11-29 04:14:37,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3806426.6666666665, ans=0.2 2023-11-29 04:14:41,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3806426.6666666665, ans=0.0 2023-11-29 04:14:43,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3806426.6666666665, ans=0.0 2023-11-29 04:14:46,535 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5850, loss[loss=0.0817, simple_loss=0.1164, pruned_loss=0.01759, audio_tagging_loss=0.005934, over 14562.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08938, pruned_loss=0.01194, audio_tagging_loss=0.00857, over 3038976.28 frames. ], batch size: 52, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:15:15,850 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571000 2023-11-29 04:15:24,831 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:15:49,163 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5900, loss[loss=0.0731, simple_loss=0.09793, pruned_loss=0.01531, audio_tagging_loss=0.008835, over 15688.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08932, pruned_loss=0.01194, audio_tagging_loss=0.008502, over 3046914.22 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:16:06,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3806893.3333333335, ans=0.125 2023-11-29 04:16:10,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3806893.3333333335, ans=0.125 2023-11-29 04:16:17,727 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571050 2023-11-29 04:16:18,799 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.746e+01 9.359e+01 9.876e+01 1.067e+02 1.252e+02, threshold=1.975e+02, percent-clipped=0.0 2023-11-29 04:16:25,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3807026.6666666665, ans=0.2 2023-11-29 04:16:26,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3807026.6666666665, ans=0.2 2023-11-29 04:16:28,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3807026.6666666665, ans=0.125 2023-11-29 04:16:30,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2023-11-29 04:16:34,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3807026.6666666665, ans=0.125 2023-11-29 04:16:37,574 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:16:50,143 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5950, loss[loss=0.07511, simple_loss=0.1043, pruned_loss=0.01516, audio_tagging_loss=0.007786, over 16351.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08987, pruned_loss=0.01199, audio_tagging_loss=0.0084, over 3058757.51 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:16:56,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3807160.0, ans=0.0 2023-11-29 04:17:08,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2023-11-29 04:17:19,993 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571100 2023-11-29 04:17:25,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.10 vs. limit=15.0 2023-11-29 04:17:35,984 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=12.0 2023-11-29 04:17:37,967 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:17:37,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3807426.6666666665, ans=0.125 2023-11-29 04:17:42,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3807426.6666666665, ans=0.125 2023-11-29 04:17:51,336 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6000, loss[loss=0.07772, simple_loss=0.1061, pruned_loss=0.01824, audio_tagging_loss=0.006437, over 14161.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08953, pruned_loss=0.01193, audio_tagging_loss=0.008401, over 3060535.42 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 04:17:51,339 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-29 04:18:18,793 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.4994, 2.9214, 3.2188, 3.0140, 3.6579, 3.7288, 3.2203, 3.2633], device='cuda:0') 2023-11-29 04:18:21,187 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.4943, 2.8928, 3.2887, 3.0002, 3.6294, 3.7188, 3.2511, 3.2531], device='cuda:0') 2023-11-29 04:18:21,543 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6067, 3.7139, 3.9939, 3.4725], device='cuda:0') 2023-11-29 04:18:31,365 INFO [train_asr.py:1267] (0/4) Epoch 48, validation: loss=0.05827, simple_loss=0.05042, pruned_loss=0.005313, audio_tagging_loss=0.02774, over 4681554.00 frames. 2023-11-29 04:18:31,365 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-29 04:18:31,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3807493.3333333335, ans=0.125 2023-11-29 04:18:43,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3807560.0, ans=10.0 2023-11-29 04:18:43,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3807560.0, ans=0.125 2023-11-29 04:18:46,344 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-11-29 04:18:56,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3807626.6666666665, ans=0.125 2023-11-29 04:19:00,400 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571150 2023-11-29 04:19:01,357 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.457e+01 8.999e+01 9.693e+01 1.031e+02 2.165e+02, threshold=1.939e+02, percent-clipped=1.0 2023-11-29 04:19:08,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3807693.3333333335, ans=0.125 2023-11-29 04:19:14,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3807693.3333333335, ans=0.0 2023-11-29 04:19:18,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3807693.3333333335, ans=0.2 2023-11-29 04:19:19,622 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 04:19:24,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3807760.0, ans=0.07 2023-11-29 04:19:32,551 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6050, loss[loss=0.08405, simple_loss=0.1144, pruned_loss=0.01777, audio_tagging_loss=0.009052, over 15144.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08968, pruned_loss=0.01193, audio_tagging_loss=0.008439, over 3059417.35 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 04:19:42,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3807826.6666666665, ans=0.125 2023-11-29 04:20:02,464 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571200 2023-11-29 04:20:08,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3807960.0, ans=0.125 2023-11-29 04:20:34,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3808160.0, ans=0.125 2023-11-29 04:20:34,998 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6100, loss[loss=0.0555, simple_loss=0.07854, pruned_loss=0.007873, audio_tagging_loss=0.008363, over 14386.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09062, pruned_loss=0.01212, audio_tagging_loss=0.008397, over 3056026.35 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:20:39,287 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.65 vs. limit=15.0 2023-11-29 04:20:47,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3808226.6666666665, ans=0.125 2023-11-29 04:20:53,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3808226.6666666665, ans=0.07 2023-11-29 04:21:02,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3808293.3333333335, ans=15.0 2023-11-29 04:21:05,575 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571250 2023-11-29 04:21:05,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3808293.3333333335, ans=0.125 2023-11-29 04:21:07,699 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.756e+01 8.969e+01 9.609e+01 1.049e+02 1.338e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-29 04:21:30,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3808426.6666666665, ans=0.1 2023-11-29 04:21:31,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3808426.6666666665, ans=0.125 2023-11-29 04:21:34,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3808426.6666666665, ans=0.1 2023-11-29 04:21:37,899 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6150, loss[loss=0.06146, simple_loss=0.0775, pruned_loss=0.009988, audio_tagging_loss=0.01272, over 14128.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.09007, pruned_loss=0.01208, audio_tagging_loss=0.008423, over 3055818.74 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:22:03,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3808626.6666666665, ans=0.125 2023-11-29 04:22:07,240 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571300 2023-11-29 04:22:38,910 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6200, loss[loss=0.06544, simple_loss=0.08533, pruned_loss=0.01518, audio_tagging_loss=0.007588, over 16367.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09022, pruned_loss=0.01217, audio_tagging_loss=0.00842, over 3053934.91 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:22:58,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3808893.3333333335, ans=0.125 2023-11-29 04:23:08,430 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571350 2023-11-29 04:23:10,644 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.802e+01 8.947e+01 9.565e+01 1.046e+02 1.413e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 04:23:17,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3809026.6666666665, ans=0.05 2023-11-29 04:23:40,231 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6250, loss[loss=0.0816, simple_loss=0.1121, pruned_loss=0.01807, audio_tagging_loss=0.007459, over 15923.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08957, pruned_loss=0.01195, audio_tagging_loss=0.008592, over 3059190.30 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:23:47,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3809160.0, ans=0.125 2023-11-29 04:23:48,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3809160.0, ans=0.125 2023-11-29 04:24:10,227 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571400 2023-11-29 04:24:16,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3809360.0, ans=0.0 2023-11-29 04:24:33,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3809426.6666666665, ans=0.125 2023-11-29 04:24:41,946 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6300, loss[loss=0.06353, simple_loss=0.09207, pruned_loss=0.01119, audio_tagging_loss=0.006298, over 15635.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08937, pruned_loss=0.01196, audio_tagging_loss=0.008606, over 3054157.56 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:24:42,818 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.28 vs. limit=22.5 2023-11-29 04:24:49,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3809493.3333333335, ans=0.125 2023-11-29 04:25:06,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3809626.6666666665, ans=0.125 2023-11-29 04:25:11,528 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571450 2023-11-29 04:25:13,817 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.701e+01 9.159e+01 9.734e+01 1.043e+02 1.366e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-29 04:25:43,854 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6350, loss[loss=0.07463, simple_loss=0.1076, pruned_loss=0.01355, audio_tagging_loss=0.007292, over 14675.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08811, pruned_loss=0.01183, audio_tagging_loss=0.00877, over 3052239.72 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:26:12,689 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571500 2023-11-29 04:26:18,526 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.35 vs. limit=22.5 2023-11-29 04:26:32,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3810093.3333333335, ans=0.2 2023-11-29 04:26:34,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3810093.3333333335, ans=0.125 2023-11-29 04:26:36,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3810093.3333333335, ans=0.0 2023-11-29 04:26:38,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3810093.3333333335, ans=0.125 2023-11-29 04:26:45,501 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6400, loss[loss=0.06805, simple_loss=0.09601, pruned_loss=0.01073, audio_tagging_loss=0.009312, over 15195.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08869, pruned_loss=0.01192, audio_tagging_loss=0.008834, over 3054860.72 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:26:46,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3810160.0, ans=0.0 2023-11-29 04:27:15,303 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571550 2023-11-29 04:27:17,527 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 8.796e+01 9.535e+01 1.038e+02 1.501e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-29 04:27:18,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3810293.3333333335, ans=0.0 2023-11-29 04:27:21,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3810360.0, ans=0.0 2023-11-29 04:27:23,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.95 vs. limit=22.5 2023-11-29 04:27:24,032 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.23 vs. limit=22.5 2023-11-29 04:27:35,415 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2023-11-29 04:27:44,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3810426.6666666665, ans=0.125 2023-11-29 04:27:46,663 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6450, loss[loss=0.06941, simple_loss=0.09736, pruned_loss=0.01387, audio_tagging_loss=0.006859, over 15484.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08834, pruned_loss=0.01178, audio_tagging_loss=0.00895, over 3051532.27 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:27:48,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3810493.3333333335, ans=0.5 2023-11-29 04:27:51,723 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.54 vs. limit=15.0 2023-11-29 04:27:53,870 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.23 vs. limit=10.0 2023-11-29 04:28:12,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3810626.6666666665, ans=0.1 2023-11-29 04:28:15,658 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:28:16,531 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571600 2023-11-29 04:28:34,279 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.07 vs. limit=22.5 2023-11-29 04:28:49,424 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6500, loss[loss=0.08517, simple_loss=0.1197, pruned_loss=0.0182, audio_tagging_loss=0.007115, over 15768.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.0889, pruned_loss=0.01206, audio_tagging_loss=0.008919, over 3058560.09 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:28:49,973 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2023-11-29 04:28:55,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3810826.6666666665, ans=0.125 2023-11-29 04:28:55,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3810826.6666666665, ans=0.125 2023-11-29 04:29:08,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3810893.3333333335, ans=0.125 2023-11-29 04:29:18,222 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571650 2023-11-29 04:29:20,559 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 9.207e+01 9.940e+01 1.055e+02 1.312e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-29 04:29:50,413 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6550, loss[loss=0.06583, simple_loss=0.08717, pruned_loss=0.01382, audio_tagging_loss=0.008426, over 15813.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.0892, pruned_loss=0.01199, audio_tagging_loss=0.008814, over 3059958.64 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:29:58,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3811160.0, ans=0.125 2023-11-29 04:30:00,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3811160.0, ans=0.125 2023-11-29 04:30:20,588 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571700 2023-11-29 04:30:20,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3811293.3333333335, ans=0.125 2023-11-29 04:30:37,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3811360.0, ans=0.0 2023-11-29 04:30:45,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3811426.6666666665, ans=0.05 2023-11-29 04:30:52,145 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6600, loss[loss=0.07442, simple_loss=0.1076, pruned_loss=0.01503, audio_tagging_loss=0.005569, over 13721.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.0897, pruned_loss=0.012, audio_tagging_loss=0.008606, over 3056460.69 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:31:22,122 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571750 2023-11-29 04:31:24,395 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.875e+01 9.047e+01 9.716e+01 1.044e+02 1.337e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-29 04:31:28,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3811693.3333333335, ans=0.07 2023-11-29 04:31:46,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.58 vs. limit=15.0 2023-11-29 04:31:47,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3811760.0, ans=0.125 2023-11-29 04:31:50,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3811760.0, ans=0.125 2023-11-29 04:31:54,306 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6650, loss[loss=0.06801, simple_loss=0.09149, pruned_loss=0.01414, audio_tagging_loss=0.008124, over 15362.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08989, pruned_loss=0.01218, audio_tagging_loss=0.008555, over 3056177.53 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:31:55,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3811826.6666666665, ans=0.125 2023-11-29 04:32:00,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3811826.6666666665, ans=0.07 2023-11-29 04:32:20,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3811960.0, ans=0.0 2023-11-29 04:32:24,024 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571800 2023-11-29 04:32:24,240 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:32:29,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2023-11-29 04:32:31,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3812026.6666666665, ans=0.125 2023-11-29 04:32:35,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3812026.6666666665, ans=0.125 2023-11-29 04:32:37,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3812026.6666666665, ans=0.2 2023-11-29 04:32:38,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3812026.6666666665, ans=0.1 2023-11-29 04:32:43,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3812093.3333333335, ans=0.1 2023-11-29 04:32:48,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3812093.3333333335, ans=0.07 2023-11-29 04:32:56,005 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6700, loss[loss=0.06713, simple_loss=0.09124, pruned_loss=0.01276, audio_tagging_loss=0.008754, over 15637.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.09028, pruned_loss=0.01209, audio_tagging_loss=0.008493, over 3050969.05 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:32:59,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3812160.0, ans=0.125 2023-11-29 04:33:13,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3812226.6666666665, ans=0.125 2023-11-29 04:33:25,707 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571850 2023-11-29 04:33:29,129 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 9.065e+01 9.575e+01 1.004e+02 1.192e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-29 04:33:36,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3812360.0, ans=0.125 2023-11-29 04:33:41,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3812360.0, ans=0.0 2023-11-29 04:33:57,353 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6750, loss[loss=0.06071, simple_loss=0.08198, pruned_loss=0.009972, audio_tagging_loss=0.009751, over 15314.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08868, pruned_loss=0.01176, audio_tagging_loss=0.008527, over 3044512.24 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:34:03,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3812493.3333333335, ans=0.05 2023-11-29 04:34:14,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3812560.0, ans=0.125 2023-11-29 04:34:26,738 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571900 2023-11-29 04:34:30,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3812626.6666666665, ans=0.1 2023-11-29 04:34:41,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3812693.3333333335, ans=0.2 2023-11-29 04:34:52,216 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=15.0 2023-11-29 04:34:59,708 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6800, loss[loss=0.06162, simple_loss=0.08443, pruned_loss=0.01114, audio_tagging_loss=0.008266, over 17163.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08875, pruned_loss=0.01169, audio_tagging_loss=0.008512, over 3045640.01 frames. ], batch size: 66, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:35:12,983 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:35:13,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3812893.3333333335, ans=0.0 2023-11-29 04:35:21,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3812893.3333333335, ans=0.2 2023-11-29 04:35:27,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3812960.0, ans=0.0 2023-11-29 04:35:29,207 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571950 2023-11-29 04:35:29,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3812960.0, ans=0.125 2023-11-29 04:35:32,506 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.971e+01 9.540e+01 1.002e+02 2.888e+02, threshold=1.908e+02, percent-clipped=1.0 2023-11-29 04:35:39,816 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.04 vs. limit=15.0 2023-11-29 04:35:46,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3813026.6666666665, ans=0.1 2023-11-29 04:36:00,807 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6850, loss[loss=0.05653, simple_loss=0.07634, pruned_loss=0.009896, audio_tagging_loss=0.008466, over 15428.00 frames. ], tot_loss[loss=0.06434, simple_loss=0.08834, pruned_loss=0.01174, audio_tagging_loss=0.008429, over 3047370.51 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:36:13,339 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=15.0 2023-11-29 04:36:15,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3813226.6666666665, ans=0.0 2023-11-29 04:36:30,986 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572000 2023-11-29 04:36:32,414 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-572000.pt 2023-11-29 04:36:41,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3813360.0, ans=0.0 2023-11-29 04:36:55,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3813426.6666666665, ans=0.125 2023-11-29 04:37:05,105 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6900, loss[loss=0.05542, simple_loss=0.08094, pruned_loss=0.008141, audio_tagging_loss=0.006813, over 14543.00 frames. ], tot_loss[loss=0.0643, simple_loss=0.08833, pruned_loss=0.01173, audio_tagging_loss=0.008402, over 3041136.23 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:37:11,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2023-11-29 04:37:31,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3813626.6666666665, ans=0.125 2023-11-29 04:37:33,070 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.19 vs. limit=15.0 2023-11-29 04:37:34,670 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572050 2023-11-29 04:37:38,077 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.957e+01 9.005e+01 9.691e+01 1.035e+02 1.354e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 04:37:38,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3813626.6666666665, ans=0.2 2023-11-29 04:37:50,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3813693.3333333335, ans=0.025 2023-11-29 04:37:55,741 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 04:37:57,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3813760.0, ans=0.2 2023-11-29 04:37:58,440 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=22.5 2023-11-29 04:38:06,656 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6950, loss[loss=0.0629, simple_loss=0.08864, pruned_loss=0.01092, audio_tagging_loss=0.00766, over 14076.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08847, pruned_loss=0.01176, audio_tagging_loss=0.008428, over 3045202.90 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:38:06,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3813826.6666666665, ans=0.2 2023-11-29 04:38:19,742 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.75 vs. limit=22.5 2023-11-29 04:38:36,806 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572100 2023-11-29 04:38:45,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3814026.6666666665, ans=0.125 2023-11-29 04:38:58,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3814093.3333333335, ans=0.125 2023-11-29 04:39:01,841 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.26 vs. limit=22.5 2023-11-29 04:39:02,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3814093.3333333335, ans=0.125 2023-11-29 04:39:07,957 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7000, loss[loss=0.05161, simple_loss=0.06365, pruned_loss=0.006956, audio_tagging_loss=0.01283, over 15287.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08834, pruned_loss=0.01178, audio_tagging_loss=0.008597, over 3042682.80 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:39:13,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3814160.0, ans=0.0 2023-11-29 04:39:17,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3814160.0, ans=0.07 2023-11-29 04:39:34,922 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=15.0 2023-11-29 04:39:38,298 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572150 2023-11-29 04:39:43,424 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.947e+01 9.387e+01 1.017e+02 2.856e+02, threshold=1.877e+02, percent-clipped=1.0 2023-11-29 04:39:47,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3814360.0, ans=0.0 2023-11-29 04:40:02,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3814426.6666666665, ans=0.125 2023-11-29 04:40:10,547 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7050, loss[loss=0.06766, simple_loss=0.09161, pruned_loss=0.01238, audio_tagging_loss=0.009474, over 15910.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08865, pruned_loss=0.01185, audio_tagging_loss=0.00864, over 3039671.02 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:40:13,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3814493.3333333335, ans=0.125 2023-11-29 04:40:17,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3814493.3333333335, ans=0.2 2023-11-29 04:40:25,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3814560.0, ans=0.125 2023-11-29 04:40:39,686 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572200 2023-11-29 04:41:12,089 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7100, loss[loss=0.07146, simple_loss=0.096, pruned_loss=0.01404, audio_tagging_loss=0.009411, over 15164.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08823, pruned_loss=0.01177, audio_tagging_loss=0.008686, over 3043117.53 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 04:41:22,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3814826.6666666665, ans=0.0 2023-11-29 04:41:28,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.49 vs. limit=15.0 2023-11-29 04:41:31,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3814893.3333333335, ans=0.0 2023-11-29 04:41:38,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3814960.0, ans=0.125 2023-11-29 04:41:40,510 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572250 2023-11-29 04:41:42,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3814960.0, ans=0.125 2023-11-29 04:41:47,411 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.957e+01 9.566e+01 1.017e+02 1.804e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 04:41:59,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3815093.3333333335, ans=0.2 2023-11-29 04:42:13,124 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7150, loss[loss=0.06692, simple_loss=0.08423, pruned_loss=0.01383, audio_tagging_loss=0.01098, over 14967.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08838, pruned_loss=0.01186, audio_tagging_loss=0.008737, over 3044545.81 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 04:42:41,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3815293.3333333335, ans=0.09899494936611666 2023-11-29 04:42:42,985 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572300 2023-11-29 04:42:43,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3815293.3333333335, ans=0.0 2023-11-29 04:42:45,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3815293.3333333335, ans=0.2 2023-11-29 04:43:13,890 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7200, loss[loss=0.05864, simple_loss=0.08127, pruned_loss=0.009543, audio_tagging_loss=0.008465, over 15184.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08772, pruned_loss=0.01179, audio_tagging_loss=0.008783, over 3043678.55 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:43:17,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3815493.3333333335, ans=0.2 2023-11-29 04:43:22,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3815493.3333333335, ans=0.125 2023-11-29 04:43:44,239 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572350 2023-11-29 04:43:50,070 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.918e+01 9.002e+01 9.674e+01 1.041e+02 1.826e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-29 04:43:56,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3815693.3333333335, ans=0.125 2023-11-29 04:43:59,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3815693.3333333335, ans=0.125 2023-11-29 04:44:15,632 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7250, loss[loss=0.06962, simple_loss=0.09494, pruned_loss=0.01262, audio_tagging_loss=0.009534, over 15229.00 frames. ], tot_loss[loss=0.06441, simple_loss=0.08775, pruned_loss=0.01176, audio_tagging_loss=0.008769, over 3039001.84 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:44:25,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3815826.6666666665, ans=0.125 2023-11-29 04:44:26,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3815893.3333333335, ans=0.1 2023-11-29 04:44:31,777 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2023-11-29 04:44:32,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3815893.3333333335, ans=0.125 2023-11-29 04:44:41,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3815960.0, ans=0.0 2023-11-29 04:44:44,348 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572400 2023-11-29 04:44:55,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3816026.6666666665, ans=0.1 2023-11-29 04:44:56,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3816026.6666666665, ans=0.125 2023-11-29 04:45:18,391 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7300, loss[loss=0.05919, simple_loss=0.0766, pruned_loss=0.01067, audio_tagging_loss=0.01022, over 14749.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08834, pruned_loss=0.01196, audio_tagging_loss=0.008753, over 3038381.78 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:45:22,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3816160.0, ans=0.1 2023-11-29 04:45:45,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3816293.3333333335, ans=0.125 2023-11-29 04:45:48,199 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572450 2023-11-29 04:45:49,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3816293.3333333335, ans=0.1 2023-11-29 04:45:54,549 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.609e+01 8.998e+01 9.655e+01 1.011e+02 1.283e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-29 04:45:58,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3816360.0, ans=0.125 2023-11-29 04:46:05,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3816360.0, ans=0.125 2023-11-29 04:46:07,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3816426.6666666665, ans=0.07 2023-11-29 04:46:19,744 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7350, loss[loss=0.07896, simple_loss=0.107, pruned_loss=0.01912, audio_tagging_loss=0.00633, over 15427.00 frames. ], tot_loss[loss=0.0643, simple_loss=0.08802, pruned_loss=0.01173, audio_tagging_loss=0.008552, over 3039179.99 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:46:24,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3816493.3333333335, ans=0.125 2023-11-29 04:46:33,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3816560.0, ans=0.125 2023-11-29 04:46:43,209 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.81 vs. limit=10.0 2023-11-29 04:46:50,092 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572500 2023-11-29 04:47:00,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3816693.3333333335, ans=0.125 2023-11-29 04:47:21,189 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7400, loss[loss=0.05272, simple_loss=0.06444, pruned_loss=0.009287, audio_tagging_loss=0.01122, over 15523.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08856, pruned_loss=0.01184, audio_tagging_loss=0.008441, over 3040594.84 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:47:51,303 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572550 2023-11-29 04:47:54,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3816960.0, ans=0.1 2023-11-29 04:47:56,927 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 8.857e+01 9.571e+01 1.032e+02 1.214e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 04:48:02,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3817026.6666666665, ans=0.125 2023-11-29 04:48:15,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3817093.3333333335, ans=0.125 2023-11-29 04:48:23,988 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7450, loss[loss=0.0529, simple_loss=0.06724, pruned_loss=0.01028, audio_tagging_loss=0.008999, over 14385.00 frames. ], tot_loss[loss=0.06402, simple_loss=0.08775, pruned_loss=0.01171, audio_tagging_loss=0.008433, over 3039137.16 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:48:52,849 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572600 2023-11-29 04:48:53,642 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.78 vs. limit=15.0 2023-11-29 04:48:56,300 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.15 vs. limit=15.0 2023-11-29 04:49:03,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3817360.0, ans=0.5 2023-11-29 04:49:05,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.61 vs. limit=12.0 2023-11-29 04:49:09,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3817360.0, ans=0.0 2023-11-29 04:49:15,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3817426.6666666665, ans=0.0 2023-11-29 04:49:16,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3817426.6666666665, ans=0.125 2023-11-29 04:49:17,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3817426.6666666665, ans=0.125 2023-11-29 04:49:21,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3817426.6666666665, ans=0.125 2023-11-29 04:49:25,557 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7500, loss[loss=0.09893, simple_loss=0.1266, pruned_loss=0.02621, audio_tagging_loss=0.009398, over 15211.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08832, pruned_loss=0.01183, audio_tagging_loss=0.008512, over 3043867.49 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:49:41,227 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.25 vs. limit=22.5 2023-11-29 04:49:51,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3817626.6666666665, ans=0.0 2023-11-29 04:49:56,478 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572650 2023-11-29 04:50:02,206 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.892e+01 9.110e+01 9.749e+01 1.048e+02 1.256e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-29 04:50:20,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3817760.0, ans=0.125 2023-11-29 04:50:27,271 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7550, loss[loss=0.0633, simple_loss=0.08831, pruned_loss=0.01255, audio_tagging_loss=0.006589, over 16012.00 frames. ], tot_loss[loss=0.0641, simple_loss=0.08782, pruned_loss=0.01169, audio_tagging_loss=0.008508, over 3047804.99 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:50:27,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3817826.6666666665, ans=0.125 2023-11-29 04:50:50,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3817893.3333333335, ans=0.125 2023-11-29 04:50:54,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3817960.0, ans=0.09899494936611666 2023-11-29 04:50:57,333 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572700 2023-11-29 04:50:57,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3817960.0, ans=0.0 2023-11-29 04:50:58,985 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.73 vs. limit=15.0 2023-11-29 04:51:03,923 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.52 vs. limit=10.0 2023-11-29 04:51:29,823 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7600, loss[loss=0.05862, simple_loss=0.079, pruned_loss=0.009269, audio_tagging_loss=0.009847, over 14054.00 frames. ], tot_loss[loss=0.06362, simple_loss=0.08712, pruned_loss=0.01157, audio_tagging_loss=0.008489, over 3048082.38 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:51:36,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3818160.0, ans=0.2 2023-11-29 04:51:38,585 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2023-11-29 04:51:58,889 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572750 2023-11-29 04:52:04,702 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.852e+01 9.526e+01 1.029e+02 1.380e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-29 04:52:04,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3818360.0, ans=0.0 2023-11-29 04:52:06,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.77 vs. limit=12.0 2023-11-29 04:52:30,896 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7650, loss[loss=0.05206, simple_loss=0.07013, pruned_loss=0.006097, audio_tagging_loss=0.0109, over 14810.00 frames. ], tot_loss[loss=0.0641, simple_loss=0.08793, pruned_loss=0.0117, audio_tagging_loss=0.008427, over 3039556.71 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:52:39,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3818493.3333333335, ans=0.125 2023-11-29 04:52:43,895 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.65 vs. limit=22.5 2023-11-29 04:52:51,772 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.62 vs. limit=15.0 2023-11-29 04:52:59,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3818626.6666666665, ans=0.125 2023-11-29 04:53:00,576 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572800 2023-11-29 04:53:11,424 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2023-11-29 04:53:32,459 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7700, loss[loss=0.05419, simple_loss=0.07304, pruned_loss=0.009121, audio_tagging_loss=0.00855, over 15385.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08901, pruned_loss=0.01179, audio_tagging_loss=0.008455, over 3036082.63 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:54:02,670 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572850 2023-11-29 04:54:09,485 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.101e+01 9.082e+01 9.588e+01 1.045e+02 1.280e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-29 04:54:15,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3819026.6666666665, ans=0.1 2023-11-29 04:54:17,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3819026.6666666665, ans=0.125 2023-11-29 04:54:20,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3819026.6666666665, ans=0.0 2023-11-29 04:54:26,369 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-29 04:54:34,874 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7750, loss[loss=0.07095, simple_loss=0.09675, pruned_loss=0.0158, audio_tagging_loss=0.006778, over 15465.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08949, pruned_loss=0.012, audio_tagging_loss=0.008446, over 3040379.53 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:54:55,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3819226.6666666665, ans=0.04949747468305833 2023-11-29 04:55:00,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3819293.3333333335, ans=0.125 2023-11-29 04:55:04,177 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572900 2023-11-29 04:55:11,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3819360.0, ans=0.0 2023-11-29 04:55:12,466 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:55:18,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3819360.0, ans=0.0 2023-11-29 04:55:24,422 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.32 vs. limit=10.0 2023-11-29 04:55:25,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3819426.6666666665, ans=0.07 2023-11-29 04:55:28,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3819426.6666666665, ans=0.125 2023-11-29 04:55:32,191 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.52 vs. limit=10.0 2023-11-29 04:55:34,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3819426.6666666665, ans=0.125 2023-11-29 04:55:36,109 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7800, loss[loss=0.08324, simple_loss=0.12, pruned_loss=0.01612, audio_tagging_loss=0.00713, over 15358.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.09012, pruned_loss=0.0121, audio_tagging_loss=0.008414, over 3034785.63 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:56:05,508 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572950 2023-11-29 04:56:12,932 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 9.184e+01 1.003e+02 1.060e+02 1.343e+02, threshold=2.007e+02, percent-clipped=0.0 2023-11-29 04:56:16,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3819693.3333333335, ans=0.2 2023-11-29 04:56:31,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3819760.0, ans=0.1 2023-11-29 04:56:37,817 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7850, loss[loss=0.06825, simple_loss=0.09113, pruned_loss=0.01289, audio_tagging_loss=0.009799, over 15669.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08945, pruned_loss=0.01202, audio_tagging_loss=0.008606, over 3038829.18 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:56:38,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3819826.6666666665, ans=0.2 2023-11-29 04:56:41,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3819826.6666666665, ans=0.125 2023-11-29 04:56:42,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=22.5 2023-11-29 04:56:47,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3819826.6666666665, ans=0.0 2023-11-29 04:57:07,827 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573000 2023-11-29 04:57:10,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3819960.0, ans=0.2 2023-11-29 04:57:27,944 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2023-11-29 04:57:33,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.23 vs. limit=22.5 2023-11-29 04:57:39,583 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7900, loss[loss=0.05037, simple_loss=0.06587, pruned_loss=0.007734, audio_tagging_loss=0.009704, over 16532.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08954, pruned_loss=0.01189, audio_tagging_loss=0.008584, over 3040222.04 frames. ], batch size: 65, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:57:47,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.44 vs. limit=15.0 2023-11-29 04:58:09,670 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573050 2023-11-29 04:58:16,472 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 9.085e+01 9.812e+01 1.049e+02 1.531e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 04:58:16,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3820360.0, ans=0.0 2023-11-29 04:58:26,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3820360.0, ans=0.95 2023-11-29 04:58:28,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3820426.6666666665, ans=0.0 2023-11-29 04:58:35,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3820426.6666666665, ans=0.0 2023-11-29 04:58:37,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3820426.6666666665, ans=0.125 2023-11-29 04:58:41,053 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7950, loss[loss=0.06797, simple_loss=0.09656, pruned_loss=0.01348, audio_tagging_loss=0.006211, over 15432.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08897, pruned_loss=0.01182, audio_tagging_loss=0.008721, over 3040107.42 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:58:45,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3820493.3333333335, ans=0.05 2023-11-29 04:58:53,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3820560.0, ans=0.5 2023-11-29 04:59:00,159 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 04:59:11,319 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573100 2023-11-29 04:59:15,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3820626.6666666665, ans=0.025 2023-11-29 04:59:21,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3820693.3333333335, ans=0.125 2023-11-29 04:59:22,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3820693.3333333335, ans=0.1 2023-11-29 04:59:28,769 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.43 vs. limit=10.0 2023-11-29 04:59:40,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3820760.0, ans=0.2 2023-11-29 04:59:43,488 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8000, loss[loss=0.06759, simple_loss=0.0953, pruned_loss=0.01373, audio_tagging_loss=0.006208, over 15392.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08783, pruned_loss=0.01167, audio_tagging_loss=0.008985, over 3034805.87 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:00:09,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3820960.0, ans=0.1 2023-11-29 05:00:12,801 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573150 2023-11-29 05:00:20,799 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.456e+01 9.160e+01 9.620e+01 1.029e+02 4.171e+02, threshold=1.924e+02, percent-clipped=1.0 2023-11-29 05:00:28,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3821026.6666666665, ans=0.125 2023-11-29 05:00:40,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3821093.3333333335, ans=0.0 2023-11-29 05:00:40,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.08 vs. limit=15.0 2023-11-29 05:00:45,158 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8050, loss[loss=0.07315, simple_loss=0.09553, pruned_loss=0.01681, audio_tagging_loss=0.008581, over 14236.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08859, pruned_loss=0.01174, audio_tagging_loss=0.008938, over 3039498.40 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:00:52,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3821160.0, ans=0.125 2023-11-29 05:01:01,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.92 vs. limit=15.0 2023-11-29 05:01:14,607 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573200 2023-11-29 05:01:19,581 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.95 vs. limit=15.0 2023-11-29 05:01:47,029 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8100, loss[loss=0.06955, simple_loss=0.1072, pruned_loss=0.009015, audio_tagging_loss=0.006938, over 15085.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08878, pruned_loss=0.01173, audio_tagging_loss=0.008799, over 3035394.19 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:02:04,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3821560.0, ans=0.0 2023-11-29 05:02:11,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.85 vs. limit=15.0 2023-11-29 05:02:16,372 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573250 2023-11-29 05:02:20,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3821626.6666666665, ans=0.125 2023-11-29 05:02:25,671 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.207e+01 9.031e+01 9.567e+01 1.056e+02 1.290e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 05:02:39,248 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=15.0 2023-11-29 05:02:47,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3821826.6666666665, ans=0.0 2023-11-29 05:02:48,024 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8150, loss[loss=0.07295, simple_loss=0.09838, pruned_loss=0.01386, audio_tagging_loss=0.009897, over 15282.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08981, pruned_loss=0.01199, audio_tagging_loss=0.008676, over 3038544.95 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:02:54,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3821826.6666666665, ans=0.125 2023-11-29 05:03:01,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3821893.3333333335, ans=0.1 2023-11-29 05:03:18,683 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573300 2023-11-29 05:03:33,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3822026.6666666665, ans=0.1 2023-11-29 05:03:35,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3822026.6666666665, ans=0.125 2023-11-29 05:03:38,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3822093.3333333335, ans=0.125 2023-11-29 05:03:41,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3822093.3333333335, ans=0.0 2023-11-29 05:03:50,160 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8200, loss[loss=0.06847, simple_loss=0.09855, pruned_loss=0.01165, audio_tagging_loss=0.007554, over 15420.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08909, pruned_loss=0.01185, audio_tagging_loss=0.008624, over 3041867.49 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:03:54,240 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:04:04,852 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2023-11-29 05:04:11,270 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:04:19,310 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573350 2023-11-29 05:04:26,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3822360.0, ans=0.125 2023-11-29 05:04:27,917 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.713e+01 9.111e+01 9.648e+01 1.058e+02 1.357e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 05:04:51,519 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8250, loss[loss=0.06909, simple_loss=0.09562, pruned_loss=0.0117, audio_tagging_loss=0.009577, over 14831.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08866, pruned_loss=0.01181, audio_tagging_loss=0.008522, over 3043384.62 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:04:57,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3822493.3333333335, ans=0.04949747468305833 2023-11-29 05:05:07,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3822560.0, ans=0.0 2023-11-29 05:05:10,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3822560.0, ans=0.1 2023-11-29 05:05:14,504 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.99 vs. limit=22.5 2023-11-29 05:05:21,008 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573400 2023-11-29 05:05:24,456 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.47 vs. limit=12.0 2023-11-29 05:05:26,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3822626.6666666665, ans=0.125 2023-11-29 05:05:38,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3822693.3333333335, ans=0.0 2023-11-29 05:05:39,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3822760.0, ans=0.125 2023-11-29 05:05:52,749 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8300, loss[loss=0.07872, simple_loss=0.1179, pruned_loss=0.01278, audio_tagging_loss=0.00697, over 15974.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08838, pruned_loss=0.01172, audio_tagging_loss=0.008547, over 3038771.28 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:05:58,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3822826.6666666665, ans=0.125 2023-11-29 05:05:59,105 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=22.5 2023-11-29 05:06:14,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3822893.3333333335, ans=0.0 2023-11-29 05:06:23,367 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573450 2023-11-29 05:06:25,102 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.35 vs. limit=6.0 2023-11-29 05:06:31,406 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.078e+01 8.946e+01 9.758e+01 1.060e+02 1.383e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-29 05:06:55,006 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8350, loss[loss=0.07506, simple_loss=0.1034, pruned_loss=0.0145, audio_tagging_loss=0.00887, over 15147.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08818, pruned_loss=0.01182, audio_tagging_loss=0.008556, over 3046427.80 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:06:59,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3823160.0, ans=0.125 2023-11-29 05:07:00,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3823160.0, ans=0.125 2023-11-29 05:07:01,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3823160.0, ans=0.0 2023-11-29 05:07:07,310 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.91 vs. limit=10.0 2023-11-29 05:07:23,673 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.43 vs. limit=15.0 2023-11-29 05:07:24,413 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573500 2023-11-29 05:07:32,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3823360.0, ans=0.0 2023-11-29 05:07:40,720 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.30 vs. limit=12.0 2023-11-29 05:07:43,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3823426.6666666665, ans=0.07 2023-11-29 05:07:54,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3823426.6666666665, ans=0.2 2023-11-29 05:07:57,404 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8400, loss[loss=0.08132, simple_loss=0.1036, pruned_loss=0.02133, audio_tagging_loss=0.008169, over 15230.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08865, pruned_loss=0.01197, audio_tagging_loss=0.008551, over 3035229.65 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:08:06,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3823493.3333333335, ans=0.125 2023-11-29 05:08:23,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3823626.6666666665, ans=0.0 2023-11-29 05:08:25,935 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573550 2023-11-29 05:08:28,205 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.94 vs. limit=15.0 2023-11-29 05:08:36,231 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.903e+01 9.025e+01 9.772e+01 1.057e+02 1.487e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 05:08:56,909 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8450, loss[loss=0.07625, simple_loss=0.1065, pruned_loss=0.01833, audio_tagging_loss=0.004661, over 14598.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08916, pruned_loss=0.01213, audio_tagging_loss=0.008529, over 3048021.69 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:08:57,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.78 vs. limit=12.0 2023-11-29 05:08:58,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3823826.6666666665, ans=0.025 2023-11-29 05:09:11,367 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2023-11-29 05:09:14,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3823893.3333333335, ans=0.0 2023-11-29 05:09:28,184 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573600 2023-11-29 05:09:35,109 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=22.5 2023-11-29 05:09:38,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3824026.6666666665, ans=0.125 2023-11-29 05:09:48,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3824093.3333333335, ans=0.125 2023-11-29 05:09:55,917 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2023-11-29 05:09:59,997 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8500, loss[loss=0.08611, simple_loss=0.1115, pruned_loss=0.02396, audio_tagging_loss=0.006418, over 16613.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08922, pruned_loss=0.01215, audio_tagging_loss=0.008509, over 3048304.44 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:10:18,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3824226.6666666665, ans=0.1 2023-11-29 05:10:25,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3824293.3333333335, ans=0.125 2023-11-29 05:10:29,815 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573650 2023-11-29 05:10:39,027 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 9.190e+01 9.692e+01 1.077e+02 1.317e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 05:10:39,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3824360.0, ans=0.125 2023-11-29 05:10:48,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3824426.6666666665, ans=0.2 2023-11-29 05:10:49,920 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.49 vs. limit=22.5 2023-11-29 05:11:02,956 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8550, loss[loss=0.06882, simple_loss=0.1043, pruned_loss=0.009879, audio_tagging_loss=0.00678, over 15384.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08858, pruned_loss=0.01197, audio_tagging_loss=0.008641, over 3051946.00 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:11:10,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3824493.3333333335, ans=0.05 2023-11-29 05:11:12,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3824493.3333333335, ans=0.125 2023-11-29 05:11:31,533 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573700 2023-11-29 05:11:36,891 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2023-11-29 05:11:51,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=22.5 2023-11-29 05:12:03,574 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8600, loss[loss=0.06311, simple_loss=0.08755, pruned_loss=0.009076, audio_tagging_loss=0.01026, over 15246.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08903, pruned_loss=0.01196, audio_tagging_loss=0.008617, over 3043818.89 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:12:16,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3824893.3333333335, ans=0.07 2023-11-29 05:12:24,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3824893.3333333335, ans=0.0 2023-11-29 05:12:27,498 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.64 vs. limit=10.0 2023-11-29 05:12:29,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3824960.0, ans=0.125 2023-11-29 05:12:33,564 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573750 2023-11-29 05:12:44,140 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.305e+01 8.900e+01 9.530e+01 1.037e+02 1.292e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-29 05:12:54,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3825093.3333333335, ans=15.0 2023-11-29 05:13:04,704 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8650, loss[loss=0.06804, simple_loss=0.1081, pruned_loss=0.008248, audio_tagging_loss=0.005756, over 15055.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08919, pruned_loss=0.01194, audio_tagging_loss=0.008654, over 3051500.95 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:13:06,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3825160.0, ans=0.0 2023-11-29 05:13:09,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3825160.0, ans=0.1 2023-11-29 05:13:13,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3825160.0, ans=0.2 2023-11-29 05:13:33,921 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.72 vs. limit=15.0 2023-11-29 05:13:34,661 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573800 2023-11-29 05:13:42,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3825360.0, ans=15.0 2023-11-29 05:13:43,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3825360.0, ans=0.125 2023-11-29 05:14:06,959 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8700, loss[loss=0.06204, simple_loss=0.08178, pruned_loss=0.01096, audio_tagging_loss=0.01019, over 14640.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08932, pruned_loss=0.01195, audio_tagging_loss=0.008648, over 3047504.86 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:14:18,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3825560.0, ans=0.125 2023-11-29 05:14:28,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3825560.0, ans=0.1 2023-11-29 05:14:36,439 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573850 2023-11-29 05:14:46,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3825693.3333333335, ans=0.2 2023-11-29 05:14:47,689 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.898e+01 9.152e+01 9.894e+01 1.070e+02 1.338e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 05:14:49,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3825693.3333333335, ans=0.0 2023-11-29 05:15:08,729 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8750, loss[loss=0.0671, simple_loss=0.09485, pruned_loss=0.01204, audio_tagging_loss=0.007635, over 15431.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08959, pruned_loss=0.01197, audio_tagging_loss=0.008723, over 3046780.54 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:15:09,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3825826.6666666665, ans=0.0 2023-11-29 05:15:18,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3825826.6666666665, ans=0.2 2023-11-29 05:15:22,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3825893.3333333335, ans=0.1 2023-11-29 05:15:25,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3825893.3333333335, ans=0.0 2023-11-29 05:15:37,803 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573900 2023-11-29 05:15:52,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3826026.6666666665, ans=0.0 2023-11-29 05:15:57,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3826093.3333333335, ans=0.2 2023-11-29 05:15:59,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3826093.3333333335, ans=0.125 2023-11-29 05:16:10,232 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8800, loss[loss=0.05495, simple_loss=0.07573, pruned_loss=0.007653, audio_tagging_loss=0.009432, over 15699.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09075, pruned_loss=0.01209, audio_tagging_loss=0.008732, over 3052303.38 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:16:23,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3826226.6666666665, ans=0.0 2023-11-29 05:16:28,171 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.85 vs. limit=10.0 2023-11-29 05:16:35,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3826293.3333333335, ans=0.125 2023-11-29 05:16:39,841 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573950 2023-11-29 05:16:50,295 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.805e+01 9.120e+01 9.746e+01 1.050e+02 1.300e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-29 05:17:01,881 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2023-11-29 05:17:11,273 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8850, loss[loss=0.04719, simple_loss=0.05947, pruned_loss=0.008121, audio_tagging_loss=0.009335, over 14893.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09045, pruned_loss=0.01201, audio_tagging_loss=0.008801, over 3058866.14 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:17:16,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3826493.3333333335, ans=0.1 2023-11-29 05:17:26,567 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:17:38,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3826626.6666666665, ans=10.0 2023-11-29 05:17:40,684 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574000 2023-11-29 05:17:52,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3826693.3333333335, ans=0.125 2023-11-29 05:17:56,806 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.66 vs. limit=6.0 2023-11-29 05:18:13,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3826826.6666666665, ans=0.0 2023-11-29 05:18:13,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3826826.6666666665, ans=0.125 2023-11-29 05:18:14,028 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8900, loss[loss=0.08193, simple_loss=0.1215, pruned_loss=0.01649, audio_tagging_loss=0.004676, over 14403.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08989, pruned_loss=0.01197, audio_tagging_loss=0.00874, over 3063176.03 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:18:17,006 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.88 vs. limit=15.0 2023-11-29 05:18:24,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3826893.3333333335, ans=0.2 2023-11-29 05:18:43,695 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574050 2023-11-29 05:18:45,326 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2023-11-29 05:18:54,679 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 9.202e+01 9.774e+01 1.025e+02 3.343e+02, threshold=1.955e+02, percent-clipped=1.0 2023-11-29 05:19:08,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3827093.3333333335, ans=0.125 2023-11-29 05:19:15,227 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8950, loss[loss=0.06292, simple_loss=0.08559, pruned_loss=0.01228, audio_tagging_loss=0.007852, over 14885.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08946, pruned_loss=0.01192, audio_tagging_loss=0.008613, over 3058244.43 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:19:35,505 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:19:45,574 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574100 2023-11-29 05:20:00,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3827360.0, ans=0.125 2023-11-29 05:20:01,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3827360.0, ans=0.125 2023-11-29 05:20:02,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3827360.0, ans=0.1 2023-11-29 05:20:17,593 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9000, loss[loss=0.06662, simple_loss=0.08972, pruned_loss=0.01102, audio_tagging_loss=0.01074, over 14682.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08993, pruned_loss=0.01197, audio_tagging_loss=0.008541, over 3061620.81 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:20:17,595 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-29 05:20:37,578 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.5144, 4.4306, 4.1614, 4.4179], device='cuda:0') 2023-11-29 05:20:56,941 INFO [train_asr.py:1267] (0/4) Epoch 48, validation: loss=0.05922, simple_loss=0.05036, pruned_loss=0.00529, audio_tagging_loss=0.02875, over 4681554.00 frames. 2023-11-29 05:20:56,942 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-29 05:21:13,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3827560.0, ans=0.125 2023-11-29 05:21:17,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3827560.0, ans=0.1 2023-11-29 05:21:26,400 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574150 2023-11-29 05:21:37,484 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 9.287e+01 9.829e+01 1.058e+02 1.335e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-29 05:21:38,063 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.16 vs. limit=12.0 2023-11-29 05:21:58,595 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9050, loss[loss=0.07764, simple_loss=0.1142, pruned_loss=0.01328, audio_tagging_loss=0.007256, over 15469.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09031, pruned_loss=0.01216, audio_tagging_loss=0.008461, over 3058341.48 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:22:02,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3827826.6666666665, ans=0.05 2023-11-29 05:22:20,345 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2023-11-29 05:22:27,909 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574200 2023-11-29 05:22:28,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3827960.0, ans=0.125 2023-11-29 05:22:53,592 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.90 vs. limit=22.5 2023-11-29 05:23:00,516 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9100, loss[loss=0.0629, simple_loss=0.09169, pruned_loss=0.01182, audio_tagging_loss=0.005232, over 14993.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.09054, pruned_loss=0.01214, audio_tagging_loss=0.008382, over 3062407.16 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:23:15,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3828226.6666666665, ans=0.125 2023-11-29 05:23:29,692 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574250 2023-11-29 05:23:40,963 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 9.007e+01 9.515e+01 1.034e+02 1.309e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-29 05:23:57,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3828426.6666666665, ans=0.2 2023-11-29 05:23:59,317 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.48 vs. limit=15.0 2023-11-29 05:24:02,102 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9150, loss[loss=0.07291, simple_loss=0.1102, pruned_loss=0.008336, audio_tagging_loss=0.009477, over 15428.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.09011, pruned_loss=0.01195, audio_tagging_loss=0.008391, over 3057820.72 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:24:31,106 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:24:32,031 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574300 2023-11-29 05:24:38,998 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.55 vs. limit=15.0 2023-11-29 05:24:43,653 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.52 vs. limit=10.0 2023-11-29 05:24:50,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3828760.0, ans=0.125 2023-11-29 05:25:04,024 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9200, loss[loss=0.06198, simple_loss=0.09123, pruned_loss=0.01081, audio_tagging_loss=0.005552, over 14616.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08978, pruned_loss=0.01195, audio_tagging_loss=0.00827, over 3058201.07 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:25:14,373 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=22.5 2023-11-29 05:25:18,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3828893.3333333335, ans=0.04949747468305833 2023-11-29 05:25:24,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3828893.3333333335, ans=0.125 2023-11-29 05:25:30,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3828960.0, ans=0.125 2023-11-29 05:25:31,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3828960.0, ans=10.0 2023-11-29 05:25:33,858 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574350 2023-11-29 05:25:44,296 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.055e+01 8.927e+01 9.501e+01 1.029e+02 1.392e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-29 05:25:48,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3829026.6666666665, ans=0.125 2023-11-29 05:26:06,035 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9250, loss[loss=0.04692, simple_loss=0.06539, pruned_loss=0.005095, audio_tagging_loss=0.009135, over 16760.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08992, pruned_loss=0.01196, audio_tagging_loss=0.008343, over 3061046.93 frames. ], batch size: 65, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:26:07,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3829160.0, ans=0.025 2023-11-29 05:26:09,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3829160.0, ans=0.0 2023-11-29 05:26:35,634 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574400 2023-11-29 05:26:46,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3829360.0, ans=0.1 2023-11-29 05:26:48,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3829360.0, ans=0.0 2023-11-29 05:26:52,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3829360.0, ans=0.0 2023-11-29 05:26:52,886 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2023-11-29 05:27:08,238 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9300, loss[loss=0.0587, simple_loss=0.08034, pruned_loss=0.008594, audio_tagging_loss=0.009938, over 16178.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08866, pruned_loss=0.01162, audio_tagging_loss=0.008415, over 3052622.54 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:27:10,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3829493.3333333335, ans=0.0 2023-11-29 05:27:30,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3829560.0, ans=0.0 2023-11-29 05:27:37,449 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574450 2023-11-29 05:27:51,512 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.967e+01 9.597e+01 1.017e+02 1.229e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-29 05:28:04,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3829760.0, ans=0.1 2023-11-29 05:28:09,011 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9350, loss[loss=0.06226, simple_loss=0.08519, pruned_loss=0.009646, audio_tagging_loss=0.01002, over 15984.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08866, pruned_loss=0.01179, audio_tagging_loss=0.008482, over 3054089.00 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:28:10,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3829826.6666666665, ans=0.125 2023-11-29 05:28:10,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3829826.6666666665, ans=0.1 2023-11-29 05:28:16,367 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:28:18,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3829826.6666666665, ans=0.125 2023-11-29 05:28:22,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3829893.3333333335, ans=0.125 2023-11-29 05:28:35,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3829960.0, ans=0.0 2023-11-29 05:28:39,341 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574500 2023-11-29 05:29:00,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3830093.3333333335, ans=0.125 2023-11-29 05:29:02,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3830093.3333333335, ans=0.1 2023-11-29 05:29:10,059 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9400, loss[loss=0.05654, simple_loss=0.07208, pruned_loss=0.009353, audio_tagging_loss=0.01115, over 15085.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08873, pruned_loss=0.01173, audio_tagging_loss=0.008597, over 3052320.08 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:29:15,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3830160.0, ans=0.0 2023-11-29 05:29:16,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3830160.0, ans=0.125 2023-11-29 05:29:24,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3830226.6666666665, ans=0.1 2023-11-29 05:29:25,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3830226.6666666665, ans=0.09899494936611666 2023-11-29 05:29:39,468 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574550 2023-11-29 05:29:52,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3830360.0, ans=0.05 2023-11-29 05:29:53,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.58 vs. limit=22.5 2023-11-29 05:29:53,482 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.610e+01 9.033e+01 9.709e+01 1.034e+02 1.178e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-29 05:29:56,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3830360.0, ans=0.04949747468305833 2023-11-29 05:30:08,107 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.41 vs. limit=15.0 2023-11-29 05:30:11,519 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.66 vs. limit=22.5 2023-11-29 05:30:12,147 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9450, loss[loss=0.07703, simple_loss=0.1159, pruned_loss=0.01303, audio_tagging_loss=0.006067, over 15429.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08864, pruned_loss=0.0118, audio_tagging_loss=0.008711, over 3052637.59 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:30:13,351 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:30:33,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3830560.0, ans=10.0 2023-11-29 05:30:37,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3830626.6666666665, ans=0.125 2023-11-29 05:30:40,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3830626.6666666665, ans=0.125 2023-11-29 05:30:41,522 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574600 2023-11-29 05:30:51,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3830693.3333333335, ans=0.125 2023-11-29 05:30:55,225 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.63 vs. limit=10.0 2023-11-29 05:31:05,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3830760.0, ans=0.04949747468305833 2023-11-29 05:31:13,401 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9500, loss[loss=0.07513, simple_loss=0.1119, pruned_loss=0.01173, audio_tagging_loss=0.007465, over 15848.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08872, pruned_loss=0.01183, audio_tagging_loss=0.008781, over 3050101.91 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:31:13,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3830826.6666666665, ans=0.1 2023-11-29 05:31:30,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3830893.3333333335, ans=0.125 2023-11-29 05:31:40,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3830960.0, ans=0.125 2023-11-29 05:31:44,323 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574650 2023-11-29 05:31:56,868 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.520e+01 8.911e+01 9.563e+01 1.027e+02 1.260e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 05:32:00,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3831026.6666666665, ans=0.125 2023-11-29 05:32:02,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3831093.3333333335, ans=0.125 2023-11-29 05:32:02,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3831093.3333333335, ans=0.2 2023-11-29 05:32:15,838 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9550, loss[loss=0.09629, simple_loss=0.1292, pruned_loss=0.02273, audio_tagging_loss=0.008939, over 15965.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08922, pruned_loss=0.01193, audio_tagging_loss=0.008809, over 3049738.74 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:32:29,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3831226.6666666665, ans=0.09899494936611666 2023-11-29 05:32:39,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3831293.3333333335, ans=0.0 2023-11-29 05:32:44,994 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574700 2023-11-29 05:32:57,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3831360.0, ans=0.125 2023-11-29 05:33:01,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3831360.0, ans=0.125 2023-11-29 05:33:11,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3831426.6666666665, ans=0.0 2023-11-29 05:33:16,372 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.80 vs. limit=6.0 2023-11-29 05:33:17,853 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9600, loss[loss=0.05365, simple_loss=0.07421, pruned_loss=0.008296, audio_tagging_loss=0.008246, over 13968.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08872, pruned_loss=0.01186, audio_tagging_loss=0.008852, over 3053688.96 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:33:25,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.78 vs. limit=6.0 2023-11-29 05:33:27,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3831493.3333333335, ans=0.125 2023-11-29 05:33:27,493 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=15.0 2023-11-29 05:33:46,239 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574750 2023-11-29 05:33:55,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3831693.3333333335, ans=0.125 2023-11-29 05:33:58,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3831693.3333333335, ans=0.1 2023-11-29 05:34:01,053 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.668e+01 9.154e+01 9.787e+01 1.038e+02 1.402e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-29 05:34:12,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3831760.0, ans=0.125 2023-11-29 05:34:15,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3831760.0, ans=0.1 2023-11-29 05:34:16,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3831760.0, ans=0.1 2023-11-29 05:34:18,752 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9650, loss[loss=0.08494, simple_loss=0.1246, pruned_loss=0.01549, audio_tagging_loss=0.007157, over 16525.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08967, pruned_loss=0.01201, audio_tagging_loss=0.008732, over 3048578.27 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:34:34,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3831893.3333333335, ans=0.0 2023-11-29 05:34:50,499 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574800 2023-11-29 05:34:50,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3831960.0, ans=0.0 2023-11-29 05:35:08,470 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.32 vs. limit=15.0 2023-11-29 05:35:20,963 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9700, loss[loss=0.06403, simple_loss=0.09278, pruned_loss=0.009361, audio_tagging_loss=0.008283, over 14620.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08973, pruned_loss=0.01191, audio_tagging_loss=0.008582, over 3047909.47 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:35:50,741 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574850 2023-11-29 05:36:00,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3832360.0, ans=0.125 2023-11-29 05:36:03,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.84 vs. limit=22.5 2023-11-29 05:36:03,815 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.548e+01 9.065e+01 9.811e+01 1.054e+02 1.349e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 05:36:05,771 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-11-29 05:36:08,967 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.34 vs. limit=22.5 2023-11-29 05:36:11,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3832426.6666666665, ans=0.125 2023-11-29 05:36:23,052 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9750, loss[loss=0.06353, simple_loss=0.08771, pruned_loss=0.00927, audio_tagging_loss=0.01041, over 15322.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09039, pruned_loss=0.01204, audio_tagging_loss=0.008519, over 3049499.66 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:36:24,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3832493.3333333335, ans=0.125 2023-11-29 05:36:51,713 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574900 2023-11-29 05:36:56,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3832626.6666666665, ans=0.2 2023-11-29 05:36:59,163 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.38 vs. limit=15.0 2023-11-29 05:37:17,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3832760.0, ans=0.125 2023-11-29 05:37:23,671 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9800, loss[loss=0.06514, simple_loss=0.09142, pruned_loss=0.01112, audio_tagging_loss=0.00831, over 14873.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08963, pruned_loss=0.01203, audio_tagging_loss=0.008506, over 3049692.70 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:37:32,368 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.21 vs. limit=12.0 2023-11-29 05:37:52,508 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574950 2023-11-29 05:38:02,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3833026.6666666665, ans=0.125 2023-11-29 05:38:05,610 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.528e+01 9.329e+01 9.816e+01 1.069e+02 1.352e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-29 05:38:12,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3833093.3333333335, ans=0.125 2023-11-29 05:38:16,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3833093.3333333335, ans=0.0 2023-11-29 05:38:20,059 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:38:23,381 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9850, loss[loss=0.06018, simple_loss=0.08188, pruned_loss=0.0105, audio_tagging_loss=0.008737, over 16216.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09045, pruned_loss=0.01223, audio_tagging_loss=0.008479, over 3049396.64 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:38:53,102 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575000 2023-11-29 05:38:57,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3833293.3333333335, ans=0.1 2023-11-29 05:38:58,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.43 vs. limit=12.0 2023-11-29 05:39:19,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3833426.6666666665, ans=0.0 2023-11-29 05:39:20,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3833426.6666666665, ans=0.0 2023-11-29 05:39:21,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3833426.6666666665, ans=0.0 2023-11-29 05:39:24,248 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9900, loss[loss=0.05479, simple_loss=0.066, pruned_loss=0.01129, audio_tagging_loss=0.0105, over 15790.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09099, pruned_loss=0.01234, audio_tagging_loss=0.008434, over 3048507.17 frames. ], batch size: 64, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:39:34,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3833493.3333333335, ans=0.0 2023-11-29 05:39:36,454 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.61 vs. limit=15.0 2023-11-29 05:39:38,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3833560.0, ans=0.125 2023-11-29 05:39:39,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3833560.0, ans=0.125 2023-11-29 05:39:45,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3833560.0, ans=0.125 2023-11-29 05:39:49,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3833626.6666666665, ans=0.1 2023-11-29 05:39:53,617 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575050 2023-11-29 05:40:06,124 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.137e+01 9.333e+01 9.894e+01 1.049e+02 1.495e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 05:40:25,374 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9950, loss[loss=0.06202, simple_loss=0.09112, pruned_loss=0.008896, audio_tagging_loss=0.007563, over 16510.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.09025, pruned_loss=0.01201, audio_tagging_loss=0.008445, over 3049749.36 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:40:35,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3833826.6666666665, ans=0.07 2023-11-29 05:40:51,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3833960.0, ans=0.07 2023-11-29 05:40:53,866 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575100 2023-11-29 05:41:03,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3834026.6666666665, ans=0.0 2023-11-29 05:41:25,621 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10000, loss[loss=0.07406, simple_loss=0.1149, pruned_loss=0.0113, audio_tagging_loss=0.005331, over 14980.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08861, pruned_loss=0.01173, audio_tagging_loss=0.008475, over 3047442.99 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:41:49,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3834226.6666666665, ans=0.1 2023-11-29 05:41:55,786 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575150 2023-11-29 05:41:57,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3834293.3333333335, ans=0.05 2023-11-29 05:42:08,202 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.882e+01 9.475e+01 1.008e+02 1.351e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-29 05:42:24,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3834426.6666666665, ans=0.1 2023-11-29 05:42:24,741 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.15 vs. limit=22.5 2023-11-29 05:42:26,227 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10050, loss[loss=0.05213, simple_loss=0.06927, pruned_loss=0.01064, audio_tagging_loss=0.006853, over 15691.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08872, pruned_loss=0.01172, audio_tagging_loss=0.008426, over 3046437.27 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:42:47,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3834560.0, ans=0.1 2023-11-29 05:42:55,668 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575200 2023-11-29 05:42:57,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3834626.6666666665, ans=0.05 2023-11-29 05:43:28,462 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10100, loss[loss=0.07494, simple_loss=0.1066, pruned_loss=0.01349, audio_tagging_loss=0.008149, over 15256.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08898, pruned_loss=0.01181, audio_tagging_loss=0.008442, over 3047921.36 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:43:34,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3834826.6666666665, ans=0.125 2023-11-29 05:43:36,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3834826.6666666665, ans=0.125 2023-11-29 05:43:43,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3834893.3333333335, ans=0.125 2023-11-29 05:43:44,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3834893.3333333335, ans=0.125 2023-11-29 05:43:56,920 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575250 2023-11-29 05:43:57,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3834960.0, ans=0.0 2023-11-29 05:44:04,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3835026.6666666665, ans=0.125 2023-11-29 05:44:10,721 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.549e+01 9.243e+01 9.943e+01 1.062e+02 1.322e+02, threshold=1.989e+02, percent-clipped=0.0 2023-11-29 05:44:20,581 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:44:23,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3835093.3333333335, ans=0.0 2023-11-29 05:44:25,669 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.59 vs. limit=15.0 2023-11-29 05:44:28,583 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10150, loss[loss=0.05106, simple_loss=0.06417, pruned_loss=0.006501, audio_tagging_loss=0.01248, over 15303.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08864, pruned_loss=0.01174, audio_tagging_loss=0.008569, over 3047936.11 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:44:44,671 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2023-11-29 05:44:46,899 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.02 vs. limit=15.0 2023-11-29 05:44:57,785 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575300 2023-11-29 05:45:00,071 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:45:07,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3835360.0, ans=0.125 2023-11-29 05:45:17,336 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.98 vs. limit=15.0 2023-11-29 05:45:28,668 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10200, loss[loss=0.05481, simple_loss=0.0752, pruned_loss=0.00735, audio_tagging_loss=0.009857, over 13453.00 frames. ], tot_loss[loss=0.06427, simple_loss=0.08797, pruned_loss=0.01165, audio_tagging_loss=0.00864, over 3041385.59 frames. ], batch size: 51, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:45:30,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3835493.3333333335, ans=0.1 2023-11-29 05:45:32,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3835493.3333333335, ans=0.125 2023-11-29 05:45:36,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3835493.3333333335, ans=0.125 2023-11-29 05:45:40,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3835560.0, ans=0.125 2023-11-29 05:45:47,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3835560.0, ans=0.125 2023-11-29 05:45:54,742 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:45:58,211 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575350 2023-11-29 05:45:58,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3835626.6666666665, ans=0.2 2023-11-29 05:46:02,273 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.58 vs. limit=15.0 2023-11-29 05:46:05,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3835693.3333333335, ans=0.1 2023-11-29 05:46:12,473 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.929e+01 8.881e+01 9.698e+01 1.024e+02 1.374e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 05:46:12,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3835693.3333333335, ans=0.125 2023-11-29 05:46:29,772 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10250, loss[loss=0.05594, simple_loss=0.08054, pruned_loss=0.007543, audio_tagging_loss=0.008125, over 14328.00 frames. ], tot_loss[loss=0.06415, simple_loss=0.08772, pruned_loss=0.01161, audio_tagging_loss=0.008678, over 3049809.91 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:46:30,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3835826.6666666665, ans=22.5 2023-11-29 05:46:31,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3835826.6666666665, ans=0.125 2023-11-29 05:46:56,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3835960.0, ans=0.125 2023-11-29 05:46:58,965 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575400 2023-11-29 05:47:09,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3836026.6666666665, ans=0.1 2023-11-29 05:47:28,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3836093.3333333335, ans=0.0 2023-11-29 05:47:30,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3836160.0, ans=0.09899494936611666 2023-11-29 05:47:30,869 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10300, loss[loss=0.05695, simple_loss=0.06831, pruned_loss=0.008598, audio_tagging_loss=0.01419, over 15521.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08815, pruned_loss=0.01173, audio_tagging_loss=0.008646, over 3054550.55 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:47:49,469 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.83 vs. limit=22.5 2023-11-29 05:47:53,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3836226.6666666665, ans=0.125 2023-11-29 05:48:00,394 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575450 2023-11-29 05:48:14,974 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 9.198e+01 9.831e+01 1.050e+02 1.376e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-29 05:48:16,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3836360.0, ans=0.125 2023-11-29 05:48:31,811 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10350, loss[loss=0.08008, simple_loss=0.11, pruned_loss=0.01738, audio_tagging_loss=0.007704, over 14928.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08946, pruned_loss=0.01185, audio_tagging_loss=0.008767, over 3057144.10 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:48:39,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3836493.3333333335, ans=0.0 2023-11-29 05:48:45,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3836560.0, ans=0.0 2023-11-29 05:49:00,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3836626.6666666665, ans=0.125 2023-11-29 05:49:01,328 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575500 2023-11-29 05:49:05,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3836626.6666666665, ans=0.125 2023-11-29 05:49:08,480 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=15.0 2023-11-29 05:49:31,951 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10400, loss[loss=0.09425, simple_loss=0.1264, pruned_loss=0.02423, audio_tagging_loss=0.006827, over 14968.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08993, pruned_loss=0.01216, audio_tagging_loss=0.008777, over 3054553.59 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:49:51,568 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2023-11-29 05:49:56,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3836960.0, ans=0.125 2023-11-29 05:50:00,918 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575550 2023-11-29 05:50:11,403 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.08 vs. limit=15.0 2023-11-29 05:50:13,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3837026.6666666665, ans=0.125 2023-11-29 05:50:15,796 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.305e+01 9.181e+01 9.898e+01 1.077e+02 1.252e+02, threshold=1.980e+02, percent-clipped=0.0 2023-11-29 05:50:26,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3837093.3333333335, ans=0.125 2023-11-29 05:50:30,262 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2023-11-29 05:50:32,057 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10450, loss[loss=0.08808, simple_loss=0.1229, pruned_loss=0.01985, audio_tagging_loss=0.006777, over 15237.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08976, pruned_loss=0.01219, audio_tagging_loss=0.00874, over 3046998.51 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:50:37,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3837160.0, ans=0.125 2023-11-29 05:50:44,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3837226.6666666665, ans=0.125 2023-11-29 05:51:02,150 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575600 2023-11-29 05:51:14,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.87 vs. limit=15.0 2023-11-29 05:51:33,386 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10500, loss[loss=0.05233, simple_loss=0.06523, pruned_loss=0.009612, audio_tagging_loss=0.0101, over 14980.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08949, pruned_loss=0.01203, audio_tagging_loss=0.008657, over 3050945.36 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:51:50,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3837560.0, ans=0.1 2023-11-29 05:51:54,321 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:52:01,903 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575650 2023-11-29 05:52:09,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3837693.3333333335, ans=0.0 2023-11-29 05:52:16,856 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.176e+01 9.059e+01 9.610e+01 1.014e+02 2.042e+02, threshold=1.922e+02, percent-clipped=1.0 2023-11-29 05:52:32,204 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.20 vs. limit=12.0 2023-11-29 05:52:34,010 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10550, loss[loss=0.0577, simple_loss=0.0741, pruned_loss=0.01098, audio_tagging_loss=0.009674, over 16346.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.0889, pruned_loss=0.01191, audio_tagging_loss=0.00852, over 3046375.14 frames. ], batch size: 64, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:52:34,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3837826.6666666665, ans=0.0 2023-11-29 05:52:42,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3837826.6666666665, ans=0.125 2023-11-29 05:52:52,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3837893.3333333335, ans=0.2 2023-11-29 05:52:53,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3837893.3333333335, ans=0.0 2023-11-29 05:53:03,043 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575700 2023-11-29 05:53:34,035 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10600, loss[loss=0.0862, simple_loss=0.1195, pruned_loss=0.01829, audio_tagging_loss=0.008159, over 14364.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08842, pruned_loss=0.01178, audio_tagging_loss=0.008474, over 3046432.01 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:53:48,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3838226.6666666665, ans=0.1 2023-11-29 05:54:04,233 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575750 2023-11-29 05:54:07,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3838293.3333333335, ans=0.2 2023-11-29 05:54:16,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3838360.0, ans=0.0 2023-11-29 05:54:17,214 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.47 vs. limit=15.0 2023-11-29 05:54:18,878 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 9.196e+01 9.659e+01 1.047e+02 1.317e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-29 05:54:19,691 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.63 vs. limit=15.0 2023-11-29 05:54:34,916 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10650, loss[loss=0.07844, simple_loss=0.117, pruned_loss=0.01314, audio_tagging_loss=0.00678, over 14751.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08911, pruned_loss=0.01195, audio_tagging_loss=0.008431, over 3050519.42 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:54:47,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3838560.0, ans=0.125 2023-11-29 05:54:58,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3838626.6666666665, ans=0.125 2023-11-29 05:55:03,698 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575800 2023-11-29 05:55:36,134 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10700, loss[loss=0.06529, simple_loss=0.09877, pruned_loss=0.007824, audio_tagging_loss=0.008078, over 15660.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08862, pruned_loss=0.01189, audio_tagging_loss=0.008439, over 3047933.73 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:55:39,075 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.75 vs. limit=15.0 2023-11-29 05:55:40,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3838826.6666666665, ans=0.1 2023-11-29 05:56:04,165 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575850 2023-11-29 05:56:04,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3838960.0, ans=0.0 2023-11-29 05:56:05,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3838960.0, ans=0.125 2023-11-29 05:56:21,884 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.915e+01 9.906e+01 1.068e+02 1.666e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-29 05:56:35,689 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10750, loss[loss=0.06146, simple_loss=0.08218, pruned_loss=0.01094, audio_tagging_loss=0.009435, over 15272.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.089, pruned_loss=0.01187, audio_tagging_loss=0.008359, over 3045690.80 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:56:37,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3839160.0, ans=0.2 2023-11-29 05:56:43,116 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2023-11-29 05:56:45,545 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2023-11-29 05:57:05,264 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575900 2023-11-29 05:57:15,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3839360.0, ans=0.09899494936611666 2023-11-29 05:57:35,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3839493.3333333335, ans=0.125 2023-11-29 05:57:36,384 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10800, loss[loss=0.07006, simple_loss=0.09615, pruned_loss=0.01591, audio_tagging_loss=0.006073, over 16324.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08908, pruned_loss=0.01188, audio_tagging_loss=0.008319, over 3048837.29 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:57:53,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3839560.0, ans=0.125 2023-11-29 05:57:58,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3839560.0, ans=0.125 2023-11-29 05:58:04,910 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575950 2023-11-29 05:58:07,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3839626.6666666665, ans=0.125 2023-11-29 05:58:07,652 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.66 vs. limit=6.0 2023-11-29 05:58:09,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3839626.6666666665, ans=0.125 2023-11-29 05:58:12,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3839693.3333333335, ans=0.0 2023-11-29 05:58:21,341 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.991e+01 9.894e+01 1.056e+02 1.415e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 05:58:35,520 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:58:37,112 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10850, loss[loss=0.07748, simple_loss=0.1114, pruned_loss=0.01351, audio_tagging_loss=0.008267, over 15321.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08885, pruned_loss=0.01193, audio_tagging_loss=0.008423, over 3042947.61 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:58:42,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3839826.6666666665, ans=0.2 2023-11-29 05:59:05,504 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576000 2023-11-29 05:59:07,488 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-576000.pt 2023-11-29 05:59:39,111 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:59:40,181 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10900, loss[loss=0.06824, simple_loss=0.09032, pruned_loss=0.01485, audio_tagging_loss=0.008225, over 15016.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08924, pruned_loss=0.01216, audio_tagging_loss=0.008466, over 3046420.08 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:00:02,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3840226.6666666665, ans=0.0 2023-11-29 06:00:09,757 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576050 2023-11-29 06:00:24,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3840360.0, ans=0.125 2023-11-29 06:00:27,256 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 8.980e+01 9.608e+01 1.052e+02 1.228e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-29 06:00:41,330 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10950, loss[loss=0.06237, simple_loss=0.08622, pruned_loss=0.01113, audio_tagging_loss=0.008132, over 15096.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08814, pruned_loss=0.01189, audio_tagging_loss=0.008521, over 3051844.78 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:00:48,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3840493.3333333335, ans=0.2 2023-11-29 06:01:00,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3840560.0, ans=0.2 2023-11-29 06:01:07,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3840626.6666666665, ans=0.125 2023-11-29 06:01:12,341 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576100 2023-11-29 06:01:19,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3840693.3333333335, ans=0.125 2023-11-29 06:01:30,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3840760.0, ans=0.125 2023-11-29 06:01:40,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3840760.0, ans=0.125 2023-11-29 06:01:41,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3840760.0, ans=0.1 2023-11-29 06:01:43,988 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11000, loss[loss=0.06267, simple_loss=0.08958, pruned_loss=0.009887, audio_tagging_loss=0.007989, over 14265.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08806, pruned_loss=0.01184, audio_tagging_loss=0.008555, over 3048351.01 frames. ], batch size: 52, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:01:56,964 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 06:02:08,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3840960.0, ans=0.0 2023-11-29 06:02:13,344 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576150 2023-11-29 06:02:17,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3840960.0, ans=0.2 2023-11-29 06:02:17,705 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:02:30,131 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.421e+01 8.865e+01 9.537e+01 1.028e+02 1.366e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-29 06:02:30,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3841026.6666666665, ans=0.0 2023-11-29 06:02:35,028 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.72 vs. limit=6.0 2023-11-29 06:02:45,433 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11050, loss[loss=0.08627, simple_loss=0.1203, pruned_loss=0.01804, audio_tagging_loss=0.008075, over 16943.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08824, pruned_loss=0.01187, audio_tagging_loss=0.008765, over 3047816.91 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:02:47,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3841160.0, ans=0.0 2023-11-29 06:03:08,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3841293.3333333335, ans=0.125 2023-11-29 06:03:14,259 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576200 2023-11-29 06:03:30,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3841360.0, ans=0.125 2023-11-29 06:03:32,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3841360.0, ans=0.0 2023-11-29 06:03:45,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3841426.6666666665, ans=0.07 2023-11-29 06:03:46,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3841493.3333333335, ans=0.125 2023-11-29 06:03:47,135 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11100, loss[loss=0.06768, simple_loss=0.09494, pruned_loss=0.01174, audio_tagging_loss=0.008475, over 14235.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08891, pruned_loss=0.01209, audio_tagging_loss=0.008772, over 3043790.21 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:03:51,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3841493.3333333335, ans=0.0 2023-11-29 06:04:10,848 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.70 vs. limit=22.5 2023-11-29 06:04:17,541 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576250 2023-11-29 06:04:20,368 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.41 vs. limit=12.0 2023-11-29 06:04:24,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3841693.3333333335, ans=0.0 2023-11-29 06:04:30,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3841693.3333333335, ans=0.125 2023-11-29 06:04:31,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3841693.3333333335, ans=0.125 2023-11-29 06:04:33,968 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 9.155e+01 9.764e+01 1.030e+02 1.216e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-29 06:04:36,065 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.53 vs. limit=22.5 2023-11-29 06:04:48,690 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11150, loss[loss=0.06368, simple_loss=0.09017, pruned_loss=0.01069, audio_tagging_loss=0.007906, over 14936.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08878, pruned_loss=0.01198, audio_tagging_loss=0.00884, over 3044756.15 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:04:54,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3841826.6666666665, ans=0.2 2023-11-29 06:05:11,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3841893.3333333335, ans=0.05 2023-11-29 06:05:14,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3841960.0, ans=0.5 2023-11-29 06:05:18,594 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576300 2023-11-29 06:05:28,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3842026.6666666665, ans=0.1 2023-11-29 06:05:32,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3842026.6666666665, ans=0.125 2023-11-29 06:05:51,042 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11200, loss[loss=0.09132, simple_loss=0.1305, pruned_loss=0.01965, audio_tagging_loss=0.006415, over 14796.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08928, pruned_loss=0.01213, audio_tagging_loss=0.008854, over 3034557.90 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 06:05:58,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3842160.0, ans=0.95 2023-11-29 06:06:19,583 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576350 2023-11-29 06:06:37,281 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.808e+01 8.953e+01 9.671e+01 1.033e+02 1.680e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 06:06:51,554 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11250, loss[loss=0.07021, simple_loss=0.0977, pruned_loss=0.01375, audio_tagging_loss=0.007607, over 15770.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.0896, pruned_loss=0.01218, audio_tagging_loss=0.008753, over 3038482.63 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 06:07:09,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3842560.0, ans=15.0 2023-11-29 06:07:19,310 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:07:21,444 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576400 2023-11-29 06:07:47,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3842760.0, ans=0.1 2023-11-29 06:07:51,203 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2023-11-29 06:07:53,073 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11300, loss[loss=0.06384, simple_loss=0.08573, pruned_loss=0.01186, audio_tagging_loss=0.009115, over 14195.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08883, pruned_loss=0.01193, audio_tagging_loss=0.008568, over 3034133.04 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:08:07,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3842893.3333333335, ans=0.125 2023-11-29 06:08:22,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3842960.0, ans=0.09899494936611666 2023-11-29 06:08:23,079 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576450 2023-11-29 06:08:42,212 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.057e+01 9.108e+01 9.647e+01 1.054e+02 1.325e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 06:08:47,138 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2023-11-29 06:08:54,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3843160.0, ans=0.125 2023-11-29 06:08:54,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3843160.0, ans=0.07 2023-11-29 06:08:55,266 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11350, loss[loss=0.05292, simple_loss=0.07265, pruned_loss=0.008835, audio_tagging_loss=0.007761, over 15584.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08838, pruned_loss=0.01196, audio_tagging_loss=0.00841, over 3036885.93 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:08:55,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3843160.0, ans=0.1 2023-11-29 06:09:04,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3843160.0, ans=0.1 2023-11-29 06:09:11,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3843226.6666666665, ans=0.1 2023-11-29 06:09:24,809 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576500 2023-11-29 06:09:34,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2023-11-29 06:09:41,817 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.69 vs. limit=15.0 2023-11-29 06:09:49,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3843426.6666666665, ans=0.1 2023-11-29 06:09:54,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3843426.6666666665, ans=0.0 2023-11-29 06:09:56,512 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11400, loss[loss=0.06488, simple_loss=0.09726, pruned_loss=0.01089, audio_tagging_loss=0.005365, over 15104.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08953, pruned_loss=0.01204, audio_tagging_loss=0.008314, over 3035450.82 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:10:07,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3843560.0, ans=0.2 2023-11-29 06:10:12,840 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.05 vs. limit=15.0 2023-11-29 06:10:26,247 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576550 2023-11-29 06:10:40,049 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:10:41,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3843693.3333333335, ans=0.0 2023-11-29 06:10:41,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3843693.3333333335, ans=0.0 2023-11-29 06:10:45,565 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.262e+01 9.236e+01 1.000e+02 1.071e+02 1.321e+02, threshold=2.000e+02, percent-clipped=0.0 2023-11-29 06:10:52,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3843760.0, ans=0.125 2023-11-29 06:10:52,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3843760.0, ans=0.2 2023-11-29 06:10:54,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3843760.0, ans=0.1 2023-11-29 06:10:57,960 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11450, loss[loss=0.06582, simple_loss=0.09089, pruned_loss=0.01039, audio_tagging_loss=0.009985, over 14579.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08921, pruned_loss=0.01193, audio_tagging_loss=0.008363, over 3043968.13 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:10:58,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3843826.6666666665, ans=0.0 2023-11-29 06:11:13,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3843893.3333333335, ans=0.125 2023-11-29 06:11:28,125 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576600 2023-11-29 06:11:29,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3843960.0, ans=0.1 2023-11-29 06:11:33,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3843960.0, ans=0.125 2023-11-29 06:11:33,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3843960.0, ans=0.125 2023-11-29 06:11:42,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3844026.6666666665, ans=0.0 2023-11-29 06:11:49,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3844093.3333333335, ans=0.125 2023-11-29 06:12:00,456 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11500, loss[loss=0.05052, simple_loss=0.05939, pruned_loss=0.007235, audio_tagging_loss=0.01359, over 14771.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.0895, pruned_loss=0.01216, audio_tagging_loss=0.008403, over 3040746.79 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:12:00,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3844160.0, ans=0.125 2023-11-29 06:12:01,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3844160.0, ans=0.125 2023-11-29 06:12:17,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3844226.6666666665, ans=0.125 2023-11-29 06:12:28,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3844293.3333333335, ans=0.125 2023-11-29 06:12:30,208 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576650 2023-11-29 06:12:47,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3844360.0, ans=0.125 2023-11-29 06:12:50,856 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.910e+01 9.568e+01 1.052e+02 1.642e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 06:12:53,783 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2023-11-29 06:12:55,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3844426.6666666665, ans=0.125 2023-11-29 06:13:00,125 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.62 vs. limit=5.0 2023-11-29 06:13:02,164 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11550, loss[loss=0.06387, simple_loss=0.08762, pruned_loss=0.009778, audio_tagging_loss=0.01028, over 15518.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.09017, pruned_loss=0.01227, audio_tagging_loss=0.008328, over 3050097.66 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:13:07,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.92 vs. limit=15.0 2023-11-29 06:13:22,559 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2023-11-29 06:13:32,402 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576700 2023-11-29 06:13:39,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3844693.3333333335, ans=0.0 2023-11-29 06:13:42,686 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 06:14:03,749 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11600, loss[loss=0.06117, simple_loss=0.08673, pruned_loss=0.008737, audio_tagging_loss=0.009067, over 15466.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09039, pruned_loss=0.01225, audio_tagging_loss=0.008417, over 3052918.50 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:14:12,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3844826.6666666665, ans=0.0 2023-11-29 06:14:32,716 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576750 2023-11-29 06:14:44,236 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:14:48,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3845026.6666666665, ans=0.125 2023-11-29 06:14:55,102 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.877e+01 9.031e+01 9.516e+01 1.044e+02 1.307e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-29 06:15:04,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3845160.0, ans=0.2 2023-11-29 06:15:05,665 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11650, loss[loss=0.06849, simple_loss=0.1055, pruned_loss=0.009299, audio_tagging_loss=0.00645, over 15416.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.0892, pruned_loss=0.01198, audio_tagging_loss=0.008463, over 3053928.52 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:15:11,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3845160.0, ans=0.2 2023-11-29 06:15:12,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3845160.0, ans=0.04949747468305833 2023-11-29 06:15:21,737 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.70 vs. limit=15.0 2023-11-29 06:15:26,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3845226.6666666665, ans=0.125 2023-11-29 06:15:35,417 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576800 2023-11-29 06:15:50,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3845360.0, ans=0.2 2023-11-29 06:16:06,840 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2023-11-29 06:16:07,160 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11700, loss[loss=0.07845, simple_loss=0.1037, pruned_loss=0.01718, audio_tagging_loss=0.009418, over 14753.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08873, pruned_loss=0.01204, audio_tagging_loss=0.00857, over 3045294.50 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:16:12,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=15.34 vs. limit=15.0 2023-11-29 06:16:17,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3845493.3333333335, ans=0.125 2023-11-29 06:16:29,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3845560.0, ans=0.125 2023-11-29 06:16:30,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3845626.6666666665, ans=0.2 2023-11-29 06:16:35,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3845626.6666666665, ans=0.0 2023-11-29 06:16:37,052 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576850 2023-11-29 06:16:58,537 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 8.889e+01 9.558e+01 1.009e+02 1.379e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-29 06:17:05,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3845760.0, ans=0.2 2023-11-29 06:17:09,137 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11750, loss[loss=0.05259, simple_loss=0.07549, pruned_loss=0.007071, audio_tagging_loss=0.007776, over 14940.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08817, pruned_loss=0.0118, audio_tagging_loss=0.008558, over 3045969.38 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:17:33,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3845960.0, ans=0.0 2023-11-29 06:17:38,213 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576900 2023-11-29 06:17:42,369 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.41 vs. limit=15.0 2023-11-29 06:17:45,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3846026.6666666665, ans=0.1 2023-11-29 06:17:54,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3846026.6666666665, ans=10.0 2023-11-29 06:18:08,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.91 vs. limit=15.0 2023-11-29 06:18:10,124 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11800, loss[loss=0.07092, simple_loss=0.1117, pruned_loss=0.008707, audio_tagging_loss=0.00634, over 15876.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08847, pruned_loss=0.01183, audio_tagging_loss=0.008541, over 3049193.84 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:18:39,484 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576950 2023-11-29 06:18:50,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.88 vs. limit=6.0 2023-11-29 06:19:01,321 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 9.085e+01 9.909e+01 1.081e+02 1.450e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-29 06:19:01,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3846426.6666666665, ans=0.1 2023-11-29 06:19:10,587 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11850, loss[loss=0.06206, simple_loss=0.08306, pruned_loss=0.01304, audio_tagging_loss=0.007495, over 14452.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08903, pruned_loss=0.01191, audio_tagging_loss=0.008592, over 3048268.31 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:19:14,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.33 vs. limit=15.0 2023-11-29 06:19:16,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3846493.3333333335, ans=0.125 2023-11-29 06:19:40,305 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577000 2023-11-29 06:19:55,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3846693.3333333335, ans=0.1 2023-11-29 06:20:05,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3846760.0, ans=0.0 2023-11-29 06:20:05,814 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.78 vs. limit=15.0 2023-11-29 06:20:11,094 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11900, loss[loss=0.06559, simple_loss=0.09203, pruned_loss=0.0109, audio_tagging_loss=0.008679, over 15382.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08904, pruned_loss=0.01187, audio_tagging_loss=0.008649, over 3041213.68 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:20:20,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3846826.6666666665, ans=0.2 2023-11-29 06:20:32,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3846893.3333333335, ans=0.125 2023-11-29 06:20:41,462 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577050 2023-11-29 06:20:43,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.86 vs. limit=15.0 2023-11-29 06:20:46,853 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.07 vs. limit=15.0 2023-11-29 06:20:51,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3847026.6666666665, ans=0.2 2023-11-29 06:21:02,403 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=15.0 2023-11-29 06:21:02,975 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.818e+01 9.006e+01 9.638e+01 1.018e+02 1.407e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-29 06:21:13,621 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11950, loss[loss=0.03888, simple_loss=0.04884, pruned_loss=0.004567, audio_tagging_loss=0.009893, over 15336.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08963, pruned_loss=0.01194, audio_tagging_loss=0.008614, over 3041740.08 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:21:20,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3847160.0, ans=0.0 2023-11-29 06:21:20,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3847160.0, ans=0.125 2023-11-29 06:21:23,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3847160.0, ans=0.125 2023-11-29 06:21:42,220 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577100 2023-11-29 06:21:56,789 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2023-11-29 06:22:04,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3847426.6666666665, ans=0.025 2023-11-29 06:22:12,433 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 12000, loss[loss=0.0852, simple_loss=0.1193, pruned_loss=0.01813, audio_tagging_loss=0.007424, over 15617.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08969, pruned_loss=0.01185, audio_tagging_loss=0.008649, over 3043312.62 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:22:12,436 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-29 06:22:48,759 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9725, 3.1497, 2.8664, 3.1541, 3.3257, 2.7838, 3.3795, 2.5585], device='cuda:0') 2023-11-29 06:22:52,520 INFO [train_asr.py:1267] (0/4) Epoch 48, validation: loss=0.05839, simple_loss=0.05056, pruned_loss=0.005496, audio_tagging_loss=0.02761, over 4681554.00 frames. 2023-11-29 06:22:52,521 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-29 06:23:17,997 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:23:21,058 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-48.pt 2023-11-29 06:23:44,110 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 0, loss[loss=0.07049, simple_loss=0.08281, pruned_loss=0.009811, audio_tagging_loss=0.01928, over 14436.00 frames. ], tot_loss[loss=0.07049, simple_loss=0.08281, pruned_loss=0.009811, audio_tagging_loss=0.01928, over 14436.00 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:23:44,112 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-29 06:24:02,203 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.3671, 6.0154, 6.3171, 5.7500], device='cuda:0') 2023-11-29 06:24:11,818 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3467, 4.3186, 4.4832, 4.4809], device='cuda:0') 2023-11-29 06:24:16,502 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8312, 4.9659, 5.1083, 4.9330], device='cuda:0') 2023-11-29 06:24:20,381 INFO [train_asr.py:1267] (0/4) Epoch 49, validation: loss=0.05827, simple_loss=0.05045, pruned_loss=0.005376, audio_tagging_loss=0.02767, over 4681554.00 frames. 2023-11-29 06:24:20,382 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-29 06:24:20,583 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577150 2023-11-29 06:24:20,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3847653.3333333335, ans=0.125 2023-11-29 06:24:42,845 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 9.193e+01 9.994e+01 1.113e+02 1.489e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-29 06:24:43,719 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.70 vs. limit=15.0 2023-11-29 06:25:07,399 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.14 vs. limit=15.0 2023-11-29 06:25:23,006 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 50, loss[loss=0.06123, simple_loss=0.0681, pruned_loss=0.00722, audio_tagging_loss=0.01996, over 14762.00 frames. ], tot_loss[loss=0.07147, simple_loss=0.0859, pruned_loss=0.01137, audio_tagging_loss=0.01715, over 685306.90 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:25:23,126 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577200 2023-11-29 06:25:29,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3847986.6666666665, ans=0.0 2023-11-29 06:25:30,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3847986.6666666665, ans=0.125 2023-11-29 06:26:08,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3848186.6666666665, ans=0.0 2023-11-29 06:26:12,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3848253.3333333335, ans=0.0 2023-11-29 06:26:21,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3848253.3333333335, ans=0.2 2023-11-29 06:26:25,063 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 100, loss[loss=0.07053, simple_loss=0.09724, pruned_loss=0.008946, audio_tagging_loss=0.01297, over 15497.00 frames. ], tot_loss[loss=0.07154, simple_loss=0.08772, pruned_loss=0.01157, audio_tagging_loss=0.0161, over 1210083.55 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:26:25,176 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577250 2023-11-29 06:26:45,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3848386.6666666665, ans=0.0 2023-11-29 06:26:49,322 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.773e+01 9.815e+01 1.050e+02 1.112e+02 1.329e+02, threshold=2.101e+02, percent-clipped=0.0 2023-11-29 06:27:25,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3848586.6666666665, ans=0.09899494936611666 2023-11-29 06:27:27,375 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 150, loss[loss=0.06599, simple_loss=0.08892, pruned_loss=0.01236, audio_tagging_loss=0.009174, over 14994.00 frames. ], tot_loss[loss=0.06943, simple_loss=0.08773, pruned_loss=0.01126, audio_tagging_loss=0.01431, over 1617716.22 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:27:27,477 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577300 2023-11-29 06:27:37,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3848653.3333333335, ans=0.0 2023-11-29 06:28:13,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3848853.3333333335, ans=0.125 2023-11-29 06:28:31,055 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 200, loss[loss=0.05903, simple_loss=0.0788, pruned_loss=0.008664, audio_tagging_loss=0.01096, over 15122.00 frames. ], tot_loss[loss=0.06795, simple_loss=0.08799, pruned_loss=0.01132, audio_tagging_loss=0.01264, over 1933531.36 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:28:31,192 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577350 2023-11-29 06:28:42,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3849053.3333333335, ans=0.125 2023-11-29 06:28:44,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3849053.3333333335, ans=0.1 2023-11-29 06:28:44,747 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.88 vs. limit=10.0 2023-11-29 06:28:53,807 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.143e+01 9.385e+01 9.861e+01 1.084e+02 1.515e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-29 06:28:55,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3849120.0, ans=0.0 2023-11-29 06:29:03,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3849120.0, ans=0.125 2023-11-29 06:29:25,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3849253.3333333335, ans=0.125 2023-11-29 06:29:31,537 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 250, loss[loss=0.06542, simple_loss=0.08836, pruned_loss=0.01254, audio_tagging_loss=0.008699, over 15935.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.08879, pruned_loss=0.01155, audio_tagging_loss=0.01142, over 2180585.40 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:29:31,682 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577400 2023-11-29 06:29:57,386 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:30:06,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3849453.3333333335, ans=0.1 2023-11-29 06:30:17,972 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.00 vs. limit=15.0 2023-11-29 06:30:33,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3849653.3333333335, ans=0.0 2023-11-29 06:30:34,220 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 300, loss[loss=0.05756, simple_loss=0.07751, pruned_loss=0.0114, audio_tagging_loss=0.007408, over 15042.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.08916, pruned_loss=0.01164, audio_tagging_loss=0.01058, over 2376542.91 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:30:34,311 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577450 2023-11-29 06:30:58,193 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.896e+01 9.309e+01 1.014e+02 1.083e+02 1.326e+02, threshold=2.029e+02, percent-clipped=0.0 2023-11-29 06:31:21,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3849853.3333333335, ans=0.95 2023-11-29 06:31:37,006 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 350, loss[loss=0.07511, simple_loss=0.106, pruned_loss=0.01413, audio_tagging_loss=0.007964, over 15328.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.0902, pruned_loss=0.0119, audio_tagging_loss=0.009973, over 2529710.15 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:31:37,121 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577500 2023-11-29 06:31:54,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3850053.3333333335, ans=0.125 2023-11-29 06:32:09,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3850120.0, ans=0.0 2023-11-29 06:32:14,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3850186.6666666665, ans=0.0 2023-11-29 06:32:39,066 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 400, loss[loss=0.07008, simple_loss=0.09682, pruned_loss=0.01092, audio_tagging_loss=0.01075, over 15208.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.08968, pruned_loss=0.01176, audio_tagging_loss=0.009704, over 2647662.76 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:32:39,178 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577550 2023-11-29 06:32:59,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3850386.6666666665, ans=0.2 2023-11-29 06:33:02,427 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.755e+01 9.458e+01 1.037e+02 1.447e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-29 06:33:14,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3850453.3333333335, ans=0.125 2023-11-29 06:33:24,031 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=15.0 2023-11-29 06:33:34,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3850586.6666666665, ans=0.0 2023-11-29 06:33:41,944 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 450, loss[loss=0.06345, simple_loss=0.08445, pruned_loss=0.01267, audio_tagging_loss=0.008555, over 14838.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08879, pruned_loss=0.01173, audio_tagging_loss=0.009414, over 2745518.82 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:33:42,032 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577600 2023-11-29 06:33:43,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3850653.3333333335, ans=10.0 2023-11-29 06:34:11,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3850786.6666666665, ans=0.125 2023-11-29 06:34:43,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3850986.6666666665, ans=0.0 2023-11-29 06:34:45,307 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 500, loss[loss=0.05507, simple_loss=0.07917, pruned_loss=0.008674, audio_tagging_loss=0.006808, over 15632.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08902, pruned_loss=0.01168, audio_tagging_loss=0.00925, over 2811659.77 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:34:45,436 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577650 2023-11-29 06:34:57,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3851053.3333333335, ans=0.125 2023-11-29 06:35:09,265 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.768e+01 8.909e+01 9.530e+01 1.043e+02 1.565e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-29 06:35:22,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3851186.6666666665, ans=0.07 2023-11-29 06:35:42,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3851253.3333333335, ans=0.035 2023-11-29 06:35:42,799 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2023-11-29 06:35:47,401 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 550, loss[loss=0.06899, simple_loss=0.1038, pruned_loss=0.009988, audio_tagging_loss=0.007127, over 14923.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.0893, pruned_loss=0.01165, audio_tagging_loss=0.009036, over 2865511.91 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:35:47,502 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577700 2023-11-29 06:35:47,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3851320.0, ans=0.125 2023-11-29 06:35:52,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3851320.0, ans=0.07 2023-11-29 06:36:49,882 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 600, loss[loss=0.05985, simple_loss=0.07107, pruned_loss=0.01239, audio_tagging_loss=0.01193, over 15300.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08995, pruned_loss=0.01186, audio_tagging_loss=0.008882, over 2906831.80 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:36:50,044 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577750 2023-11-29 06:37:03,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3851720.0, ans=0.0 2023-11-29 06:37:06,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3851720.0, ans=0.0 2023-11-29 06:37:06,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3851720.0, ans=0.125 2023-11-29 06:37:14,828 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.590e+01 8.849e+01 9.501e+01 1.048e+02 1.415e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-29 06:37:22,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3851786.6666666665, ans=0.1 2023-11-29 06:37:25,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3851786.6666666665, ans=0.125 2023-11-29 06:37:44,657 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.98 vs. limit=15.0 2023-11-29 06:37:52,658 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 650, loss[loss=0.08002, simple_loss=0.1132, pruned_loss=0.01631, audio_tagging_loss=0.007115, over 15359.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09066, pruned_loss=0.01191, audio_tagging_loss=0.008853, over 2941043.56 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:37:52,759 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577800 2023-11-29 06:37:52,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3851986.6666666665, ans=0.0 2023-11-29 06:37:59,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3851986.6666666665, ans=0.125 2023-11-29 06:38:04,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3852053.3333333335, ans=0.0 2023-11-29 06:38:14,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3852053.3333333335, ans=0.0 2023-11-29 06:38:15,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3852053.3333333335, ans=0.1 2023-11-29 06:38:42,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3852253.3333333335, ans=0.125 2023-11-29 06:38:48,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3852253.3333333335, ans=15.0 2023-11-29 06:38:53,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3852253.3333333335, ans=0.035 2023-11-29 06:38:55,894 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 700, loss[loss=0.05932, simple_loss=0.08634, pruned_loss=0.008647, audio_tagging_loss=0.007504, over 15482.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08974, pruned_loss=0.01182, audio_tagging_loss=0.008833, over 2969200.99 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:38:56,017 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577850 2023-11-29 06:38:58,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3852320.0, ans=0.125 2023-11-29 06:39:20,751 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 9.141e+01 9.968e+01 1.043e+02 1.174e+02, threshold=1.994e+02, percent-clipped=0.0 2023-11-29 06:39:22,643 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.78 vs. limit=10.0 2023-11-29 06:39:28,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3852453.3333333335, ans=0.125 2023-11-29 06:39:37,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3852520.0, ans=0.125 2023-11-29 06:39:58,572 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 750, loss[loss=0.0696, simple_loss=0.09921, pruned_loss=0.009516, audio_tagging_loss=0.01048, over 16035.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09064, pruned_loss=0.01205, audio_tagging_loss=0.008853, over 2987349.93 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:39:58,691 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577900 2023-11-29 06:40:01,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3852653.3333333335, ans=0.04949747468305833 2023-11-29 06:40:21,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3852720.0, ans=0.125 2023-11-29 06:40:44,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3852853.3333333335, ans=0.2 2023-11-29 06:41:01,477 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 800, loss[loss=0.07055, simple_loss=0.09756, pruned_loss=0.01333, audio_tagging_loss=0.008441, over 16064.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08955, pruned_loss=0.01179, audio_tagging_loss=0.008941, over 3003742.42 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:41:01,590 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577950 2023-11-29 06:41:04,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3852986.6666666665, ans=0.125 2023-11-29 06:41:20,973 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.07 vs. limit=15.0 2023-11-29 06:41:26,086 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.808e+01 9.191e+01 9.688e+01 1.032e+02 1.219e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 06:42:04,108 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 850, loss[loss=0.05987, simple_loss=0.08865, pruned_loss=0.008486, audio_tagging_loss=0.00706, over 15176.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.09037, pruned_loss=0.01189, audio_tagging_loss=0.008852, over 3019717.72 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:42:04,231 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578000 2023-11-29 06:42:10,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3853320.0, ans=0.125 2023-11-29 06:42:15,760 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.80 vs. limit=15.0 2023-11-29 06:42:25,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3853386.6666666665, ans=0.125 2023-11-29 06:42:55,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=15.0 2023-11-29 06:43:05,878 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 900, loss[loss=0.06225, simple_loss=0.07826, pruned_loss=0.01274, audio_tagging_loss=0.01038, over 15023.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09007, pruned_loss=0.01197, audio_tagging_loss=0.008866, over 3030500.04 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:43:05,997 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578050 2023-11-29 06:43:12,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3853653.3333333335, ans=0.125 2023-11-29 06:43:18,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3853720.0, ans=0.125 2023-11-29 06:43:19,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3853720.0, ans=0.125 2023-11-29 06:43:31,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3853786.6666666665, ans=0.125 2023-11-29 06:43:31,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3853786.6666666665, ans=0.04949747468305833 2023-11-29 06:43:33,424 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.882e+01 9.400e+01 1.003e+02 1.065e+02 1.240e+02, threshold=2.006e+02, percent-clipped=0.0 2023-11-29 06:43:58,777 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.61 vs. limit=15.0 2023-11-29 06:44:01,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3853920.0, ans=0.0 2023-11-29 06:44:02,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3853920.0, ans=0.2 2023-11-29 06:44:02,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3853920.0, ans=0.0 2023-11-29 06:44:09,232 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 950, loss[loss=0.04807, simple_loss=0.06242, pruned_loss=0.007384, audio_tagging_loss=0.009474, over 15064.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.0898, pruned_loss=0.01188, audio_tagging_loss=0.008796, over 3035273.46 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:44:09,352 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578100 2023-11-29 06:44:09,647 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:44:37,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3854120.0, ans=0.2 2023-11-29 06:44:45,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3854186.6666666665, ans=0.125 2023-11-29 06:44:51,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3854186.6666666665, ans=0.1 2023-11-29 06:44:59,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3854253.3333333335, ans=0.125 2023-11-29 06:45:05,647 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:45:11,245 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1000, loss[loss=0.05504, simple_loss=0.07603, pruned_loss=0.0106, audio_tagging_loss=0.006436, over 15778.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08969, pruned_loss=0.0119, audio_tagging_loss=0.00867, over 3041524.49 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:45:11,380 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578150 2023-11-29 06:45:20,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3854320.0, ans=0.0 2023-11-29 06:45:27,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3854386.6666666665, ans=0.125 2023-11-29 06:45:37,176 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.501e+01 8.899e+01 9.614e+01 1.019e+02 1.244e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-29 06:45:37,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3854453.3333333335, ans=0.2 2023-11-29 06:45:39,606 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 06:46:09,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3854586.6666666665, ans=0.125 2023-11-29 06:46:12,535 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1050, loss[loss=0.04125, simple_loss=0.05226, pruned_loss=0.005223, audio_tagging_loss=0.009899, over 15407.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08958, pruned_loss=0.01182, audio_tagging_loss=0.008554, over 3040430.93 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:46:12,652 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578200 2023-11-29 06:46:25,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3854720.0, ans=0.1 2023-11-29 06:46:25,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3854720.0, ans=0.0 2023-11-29 06:46:37,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3854786.6666666665, ans=0.125 2023-11-29 06:47:02,807 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:47:09,766 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.73 vs. limit=5.0 2023-11-29 06:47:15,129 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1100, loss[loss=0.07232, simple_loss=0.09752, pruned_loss=0.01161, audio_tagging_loss=0.01195, over 16351.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08955, pruned_loss=0.01183, audio_tagging_loss=0.00856, over 3041904.06 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:47:15,214 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578250 2023-11-29 06:47:19,091 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.85 vs. limit=12.0 2023-11-29 06:47:19,615 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 06:47:30,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3855053.3333333335, ans=0.125 2023-11-29 06:47:30,577 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=15.0 2023-11-29 06:47:35,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3855053.3333333335, ans=0.1 2023-11-29 06:47:40,642 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.690e+01 9.246e+01 9.671e+01 1.044e+02 1.404e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 06:47:55,110 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.55 vs. limit=15.0 2023-11-29 06:48:11,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3855253.3333333335, ans=0.04949747468305833 2023-11-29 06:48:18,226 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1150, loss[loss=0.03902, simple_loss=0.0519, pruned_loss=0.002786, audio_tagging_loss=0.01029, over 15241.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.09002, pruned_loss=0.01189, audio_tagging_loss=0.008404, over 3050706.33 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:48:18,351 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578300 2023-11-29 06:48:18,922 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.19 vs. limit=12.0 2023-11-29 06:48:23,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3855320.0, ans=0.125 2023-11-29 06:48:51,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3855453.3333333335, ans=0.125 2023-11-29 06:48:59,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3855520.0, ans=0.2 2023-11-29 06:49:09,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3855586.6666666665, ans=0.125 2023-11-29 06:49:12,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3855586.6666666665, ans=0.0 2023-11-29 06:49:19,699 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1200, loss[loss=0.07768, simple_loss=0.1075, pruned_loss=0.01512, audio_tagging_loss=0.008809, over 15756.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08905, pruned_loss=0.01166, audio_tagging_loss=0.008465, over 3051603.97 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:49:19,836 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578350 2023-11-29 06:49:27,371 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.58 vs. limit=15.0 2023-11-29 06:49:33,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3855720.0, ans=0.125 2023-11-29 06:49:39,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3855720.0, ans=0.05 2023-11-29 06:49:39,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3855720.0, ans=0.0 2023-11-29 06:49:47,199 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.561e+01 8.935e+01 9.457e+01 1.024e+02 1.157e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-29 06:49:52,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3855786.6666666665, ans=0.0 2023-11-29 06:50:03,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3855853.3333333335, ans=0.1 2023-11-29 06:50:12,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3855920.0, ans=0.2 2023-11-29 06:50:18,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3855920.0, ans=0.0 2023-11-29 06:50:21,570 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1250, loss[loss=0.06696, simple_loss=0.08904, pruned_loss=0.01398, audio_tagging_loss=0.008464, over 15056.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08831, pruned_loss=0.01162, audio_tagging_loss=0.008503, over 3051750.28 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:50:21,758 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578400 2023-11-29 06:50:47,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3856120.0, ans=0.0 2023-11-29 06:50:52,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3856120.0, ans=0.025 2023-11-29 06:51:08,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3856186.6666666665, ans=0.125 2023-11-29 06:51:10,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3856253.3333333335, ans=0.0 2023-11-29 06:51:18,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3856253.3333333335, ans=0.05 2023-11-29 06:51:24,784 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1300, loss[loss=0.05759, simple_loss=0.06913, pruned_loss=0.01069, audio_tagging_loss=0.01234, over 15694.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08934, pruned_loss=0.01179, audio_tagging_loss=0.008389, over 3061238.61 frames. ], batch size: 62, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:51:24,894 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578450 2023-11-29 06:51:35,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3856386.6666666665, ans=0.125 2023-11-29 06:51:37,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3856386.6666666665, ans=0.2 2023-11-29 06:51:50,518 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.203e+01 8.934e+01 9.381e+01 1.015e+02 1.347e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-29 06:51:53,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3856453.3333333335, ans=0.2 2023-11-29 06:51:53,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3856453.3333333335, ans=0.125 2023-11-29 06:52:01,279 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.31 vs. limit=22.5 2023-11-29 06:52:13,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3856586.6666666665, ans=0.125 2023-11-29 06:52:17,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3856586.6666666665, ans=0.2 2023-11-29 06:52:20,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.04 vs. limit=12.0 2023-11-29 06:52:21,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3856586.6666666665, ans=0.125 2023-11-29 06:52:25,854 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1350, loss[loss=0.05245, simple_loss=0.06959, pruned_loss=0.009124, audio_tagging_loss=0.008532, over 14954.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08926, pruned_loss=0.0117, audio_tagging_loss=0.008477, over 3052420.54 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:52:25,991 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578500 2023-11-29 06:52:59,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3856786.6666666665, ans=0.04949747468305833 2023-11-29 06:53:02,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3856853.3333333335, ans=0.125 2023-11-29 06:53:03,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3856853.3333333335, ans=0.2 2023-11-29 06:53:10,909 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 06:53:18,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3856920.0, ans=0.125 2023-11-29 06:53:26,925 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1400, loss[loss=0.06261, simple_loss=0.08532, pruned_loss=0.01304, audio_tagging_loss=0.006917, over 14205.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08912, pruned_loss=0.01167, audio_tagging_loss=0.008484, over 3059276.50 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:53:27,029 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578550 2023-11-29 06:53:46,166 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.44 vs. limit=15.0 2023-11-29 06:53:46,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3857053.3333333335, ans=0.125 2023-11-29 06:53:54,950 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.714e+01 9.091e+01 9.742e+01 1.050e+02 1.544e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-29 06:54:10,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3857186.6666666665, ans=0.2 2023-11-29 06:54:17,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.44 vs. limit=15.0 2023-11-29 06:54:18,479 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.87 vs. limit=22.5 2023-11-29 06:54:29,544 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1450, loss[loss=0.05535, simple_loss=0.07629, pruned_loss=0.01074, audio_tagging_loss=0.006474, over 15040.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08918, pruned_loss=0.01172, audio_tagging_loss=0.008553, over 3060883.71 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:54:29,649 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578600 2023-11-29 06:54:54,053 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.87 vs. limit=15.0 2023-11-29 06:54:59,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3857453.3333333335, ans=0.0 2023-11-29 06:55:28,439 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=15.0 2023-11-29 06:55:31,280 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1500, loss[loss=0.05904, simple_loss=0.07752, pruned_loss=0.009704, audio_tagging_loss=0.01058, over 14547.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.0884, pruned_loss=0.01168, audio_tagging_loss=0.008613, over 3054951.38 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:55:31,390 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578650 2023-11-29 06:55:42,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3857720.0, ans=0.2 2023-11-29 06:55:51,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3857720.0, ans=0.05 2023-11-29 06:55:57,799 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 9.098e+01 9.715e+01 1.024e+02 1.252e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-29 06:56:12,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.52 vs. limit=10.0 2023-11-29 06:56:23,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3857920.0, ans=0.125 2023-11-29 06:56:32,902 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1550, loss[loss=0.06557, simple_loss=0.09168, pruned_loss=0.01151, audio_tagging_loss=0.008219, over 15797.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08884, pruned_loss=0.01176, audio_tagging_loss=0.008632, over 3049869.50 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:56:33,016 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578700 2023-11-29 06:56:42,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3857986.6666666665, ans=0.125 2023-11-29 06:56:42,724 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:56:44,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.79 vs. limit=10.0 2023-11-29 06:57:05,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3858120.0, ans=0.125 2023-11-29 06:57:06,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3858120.0, ans=0.0 2023-11-29 06:57:34,238 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1600, loss[loss=0.08606, simple_loss=0.1239, pruned_loss=0.01852, audio_tagging_loss=0.005609, over 16299.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08949, pruned_loss=0.01194, audio_tagging_loss=0.008699, over 3051563.44 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:57:34,368 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578750 2023-11-29 06:57:41,781 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2023-11-29 06:57:51,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3858386.6666666665, ans=0.0 2023-11-29 06:58:00,875 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 9.073e+01 9.678e+01 1.045e+02 1.590e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-29 06:58:03,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3858453.3333333335, ans=0.1 2023-11-29 06:58:11,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3858520.0, ans=0.2 2023-11-29 06:58:12,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3858520.0, ans=0.125 2023-11-29 06:58:21,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3858520.0, ans=0.0 2023-11-29 06:58:36,012 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1650, loss[loss=0.08245, simple_loss=0.1126, pruned_loss=0.01496, audio_tagging_loss=0.01118, over 15182.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08888, pruned_loss=0.01183, audio_tagging_loss=0.008746, over 3047474.96 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:58:36,142 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578800 2023-11-29 06:59:04,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3858786.6666666665, ans=0.125 2023-11-29 06:59:13,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3858853.3333333335, ans=0.1 2023-11-29 06:59:37,384 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1700, loss[loss=0.07354, simple_loss=0.09544, pruned_loss=0.01575, audio_tagging_loss=0.01007, over 15781.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08946, pruned_loss=0.01206, audio_tagging_loss=0.008793, over 3058060.18 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:59:37,490 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578850 2023-11-29 06:59:42,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3858986.6666666665, ans=0.125 2023-11-29 06:59:44,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3858986.6666666665, ans=0.125 2023-11-29 06:59:45,322 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:59:58,078 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.15 vs. limit=12.0 2023-11-29 07:00:04,956 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.03 vs. limit=15.0 2023-11-29 07:00:07,255 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.933e+01 9.056e+01 9.736e+01 1.037e+02 1.295e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-29 07:00:11,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3859120.0, ans=0.125 2023-11-29 07:00:15,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3859186.6666666665, ans=0.1 2023-11-29 07:00:18,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3859186.6666666665, ans=0.125 2023-11-29 07:00:26,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3859253.3333333335, ans=0.05 2023-11-29 07:00:39,577 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1750, loss[loss=0.08055, simple_loss=0.1188, pruned_loss=0.01396, audio_tagging_loss=0.007201, over 15626.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08929, pruned_loss=0.01193, audio_tagging_loss=0.008705, over 3055368.86 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:00:39,675 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578900 2023-11-29 07:01:33,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3859586.6666666665, ans=0.125 2023-11-29 07:01:42,541 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1800, loss[loss=0.07576, simple_loss=0.103, pruned_loss=0.0142, audio_tagging_loss=0.01007, over 15377.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08979, pruned_loss=0.012, audio_tagging_loss=0.00858, over 3057273.31 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:01:42,701 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578950 2023-11-29 07:01:46,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3859653.3333333335, ans=0.125 2023-11-29 07:01:55,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3859720.0, ans=0.0 2023-11-29 07:02:11,105 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 9.159e+01 9.748e+01 1.040e+02 1.409e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-29 07:02:17,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3859786.6666666665, ans=0.125 2023-11-29 07:02:39,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3859920.0, ans=0.125 2023-11-29 07:02:44,382 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1850, loss[loss=0.07779, simple_loss=0.1111, pruned_loss=0.01583, audio_tagging_loss=0.006434, over 15240.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.0896, pruned_loss=0.01193, audio_tagging_loss=0.008553, over 3052161.37 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:02:44,541 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579000 2023-11-29 07:03:10,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3860120.0, ans=0.125 2023-11-29 07:03:15,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3860120.0, ans=0.05 2023-11-29 07:03:22,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3860186.6666666665, ans=0.0 2023-11-29 07:03:29,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3860186.6666666665, ans=0.125 2023-11-29 07:03:43,275 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=22.5 2023-11-29 07:03:46,132 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1900, loss[loss=0.05338, simple_loss=0.0698, pruned_loss=0.00887, audio_tagging_loss=0.009608, over 16340.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08957, pruned_loss=0.01177, audio_tagging_loss=0.008439, over 3051698.99 frames. ], batch size: 64, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:03:46,233 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579050 2023-11-29 07:04:00,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3860386.6666666665, ans=0.1 2023-11-29 07:04:14,617 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.649e+01 8.930e+01 9.376e+01 1.025e+02 1.828e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-29 07:04:19,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3860453.3333333335, ans=0.0 2023-11-29 07:04:19,943 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=22.5 2023-11-29 07:04:30,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3860520.0, ans=0.1 2023-11-29 07:04:33,649 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=12.0 2023-11-29 07:04:42,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3860586.6666666665, ans=0.05 2023-11-29 07:04:46,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3860653.3333333335, ans=0.05 2023-11-29 07:04:47,572 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1950, loss[loss=0.05386, simple_loss=0.07658, pruned_loss=0.008035, audio_tagging_loss=0.007539, over 16694.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08897, pruned_loss=0.01165, audio_tagging_loss=0.008406, over 3049995.62 frames. ], batch size: 63, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:04:47,662 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579100 2023-11-29 07:05:23,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3860853.3333333335, ans=0.125 2023-11-29 07:05:28,822 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.65 vs. limit=22.5 2023-11-29 07:05:48,980 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2000, loss[loss=0.09719, simple_loss=0.142, pruned_loss=0.02112, audio_tagging_loss=0.005089, over 15315.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08908, pruned_loss=0.01182, audio_tagging_loss=0.008399, over 3043028.35 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:05:49,128 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579150 2023-11-29 07:05:50,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3860986.6666666665, ans=0.125 2023-11-29 07:06:06,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3861053.3333333335, ans=0.125 2023-11-29 07:06:16,981 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 9.275e+01 1.004e+02 1.066e+02 1.335e+02, threshold=2.008e+02, percent-clipped=0.0 2023-11-29 07:06:42,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3861253.3333333335, ans=0.0 2023-11-29 07:06:44,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3861253.3333333335, ans=0.125 2023-11-29 07:06:48,617 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2023-11-29 07:06:50,254 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2050, loss[loss=0.05563, simple_loss=0.07245, pruned_loss=0.008429, audio_tagging_loss=0.01098, over 15442.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08939, pruned_loss=0.01199, audio_tagging_loss=0.008407, over 3042737.05 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:06:50,415 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579200 2023-11-29 07:07:14,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3861453.3333333335, ans=0.125 2023-11-29 07:07:30,443 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.77 vs. limit=12.0 2023-11-29 07:07:40,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3861586.6666666665, ans=0.125 2023-11-29 07:07:51,815 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2100, loss[loss=0.07668, simple_loss=0.1065, pruned_loss=0.01553, audio_tagging_loss=0.007915, over 15830.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.09002, pruned_loss=0.0121, audio_tagging_loss=0.008363, over 3050164.76 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:07:51,956 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579250 2023-11-29 07:08:20,411 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 9.068e+01 9.532e+01 1.017e+02 1.251e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-29 07:08:21,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3861786.6666666665, ans=0.07 2023-11-29 07:08:51,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3861986.6666666665, ans=0.125 2023-11-29 07:08:52,594 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2150, loss[loss=0.07605, simple_loss=0.1008, pruned_loss=0.01546, audio_tagging_loss=0.01019, over 14175.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.0905, pruned_loss=0.01216, audio_tagging_loss=0.008334, over 3049728.54 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:08:52,712 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579300 2023-11-29 07:09:01,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3861986.6666666665, ans=0.1 2023-11-29 07:09:10,792 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:09:10,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3862053.3333333335, ans=0.125 2023-11-29 07:09:14,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3862053.3333333335, ans=15.0 2023-11-29 07:09:31,198 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:09:32,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3862186.6666666665, ans=0.05 2023-11-29 07:09:39,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3862186.6666666665, ans=0.1 2023-11-29 07:09:41,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3862253.3333333335, ans=0.125 2023-11-29 07:09:55,032 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2200, loss[loss=0.06839, simple_loss=0.09924, pruned_loss=0.01149, audio_tagging_loss=0.007281, over 16453.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08975, pruned_loss=0.01189, audio_tagging_loss=0.008402, over 3046984.86 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:09:55,139 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579350 2023-11-29 07:10:00,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3862320.0, ans=0.1 2023-11-29 07:10:22,820 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 9.191e+01 9.631e+01 1.029e+02 1.249e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 07:10:27,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3862453.3333333335, ans=0.07 2023-11-29 07:10:35,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3862520.0, ans=0.2 2023-11-29 07:10:41,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3862520.0, ans=0.2 2023-11-29 07:10:45,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3862586.6666666665, ans=0.0 2023-11-29 07:10:46,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3862586.6666666665, ans=0.125 2023-11-29 07:10:52,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3862586.6666666665, ans=0.125 2023-11-29 07:10:55,453 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2250, loss[loss=0.06325, simple_loss=0.08996, pruned_loss=0.009413, audio_tagging_loss=0.008858, over 16169.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.09007, pruned_loss=0.01195, audio_tagging_loss=0.008384, over 3046668.93 frames. ], batch size: 62, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:10:55,599 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579400 2023-11-29 07:11:03,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3862653.3333333335, ans=0.1 2023-11-29 07:11:06,128 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.30 vs. limit=6.0 2023-11-29 07:11:09,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3862720.0, ans=0.1 2023-11-29 07:11:17,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3862720.0, ans=0.125 2023-11-29 07:11:19,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3862786.6666666665, ans=0.1 2023-11-29 07:11:27,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3862786.6666666665, ans=0.0 2023-11-29 07:11:56,116 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2300, loss[loss=0.04881, simple_loss=0.0528, pruned_loss=0.006744, audio_tagging_loss=0.01567, over 14337.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08997, pruned_loss=0.01202, audio_tagging_loss=0.008446, over 3045503.67 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:11:56,210 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579450 2023-11-29 07:11:59,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3862986.6666666665, ans=0.04949747468305833 2023-11-29 07:12:04,794 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.11 vs. limit=15.0 2023-11-29 07:12:05,985 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.29 vs. limit=15.0 2023-11-29 07:12:24,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3863120.0, ans=0.125 2023-11-29 07:12:26,966 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.966e+01 9.023e+01 9.872e+01 1.066e+02 2.413e+02, threshold=1.974e+02, percent-clipped=1.0 2023-11-29 07:12:32,385 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.88 vs. limit=22.5 2023-11-29 07:12:33,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3863186.6666666665, ans=0.1 2023-11-29 07:12:46,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3863253.3333333335, ans=0.0 2023-11-29 07:12:52,579 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:12:58,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3863320.0, ans=0.125 2023-11-29 07:12:59,113 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2350, loss[loss=0.07137, simple_loss=0.09091, pruned_loss=0.01375, audio_tagging_loss=0.01217, over 16277.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08912, pruned_loss=0.01193, audio_tagging_loss=0.008555, over 3034691.06 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:12:59,911 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579500 2023-11-29 07:13:18,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3863386.6666666665, ans=0.0 2023-11-29 07:13:21,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3863386.6666666665, ans=0.2 2023-11-29 07:13:21,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3863386.6666666665, ans=0.1 2023-11-29 07:13:30,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3863453.3333333335, ans=0.09899494936611666 2023-11-29 07:13:44,895 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.35 vs. limit=10.0 2023-11-29 07:13:57,499 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.34 vs. limit=15.0 2023-11-29 07:14:00,825 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2400, loss[loss=0.07937, simple_loss=0.107, pruned_loss=0.01827, audio_tagging_loss=0.007611, over 16217.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08892, pruned_loss=0.01195, audio_tagging_loss=0.008655, over 3038245.49 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:14:00,933 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579550 2023-11-29 07:14:27,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3863786.6666666665, ans=0.2 2023-11-29 07:14:29,229 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.703e+01 9.216e+01 9.806e+01 1.047e+02 1.244e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-29 07:14:42,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.46 vs. limit=15.0 2023-11-29 07:15:00,883 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2450, loss[loss=0.08437, simple_loss=0.111, pruned_loss=0.02076, audio_tagging_loss=0.008129, over 16442.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08872, pruned_loss=0.01183, audio_tagging_loss=0.008674, over 3044533.67 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:15:00,981 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579600 2023-11-29 07:15:17,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3864053.3333333335, ans=0.2 2023-11-29 07:15:27,600 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=22.5 2023-11-29 07:16:02,334 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2500, loss[loss=0.06255, simple_loss=0.08309, pruned_loss=0.01077, audio_tagging_loss=0.01023, over 15103.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08897, pruned_loss=0.01193, audio_tagging_loss=0.008754, over 3048565.41 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:16:02,446 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579650 2023-11-29 07:16:05,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3864320.0, ans=0.125 2023-11-29 07:16:17,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3864386.6666666665, ans=0.1 2023-11-29 07:16:31,838 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.736e+01 9.100e+01 9.554e+01 1.019e+02 1.302e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-29 07:16:33,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3864453.3333333335, ans=0.1 2023-11-29 07:16:44,662 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-11-29 07:16:46,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3864520.0, ans=0.125 2023-11-29 07:16:49,822 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=22.5 2023-11-29 07:16:57,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3864586.6666666665, ans=0.1 2023-11-29 07:17:01,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3864586.6666666665, ans=0.0 2023-11-29 07:17:04,447 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2550, loss[loss=0.07283, simple_loss=0.09671, pruned_loss=0.01346, audio_tagging_loss=0.011, over 15288.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.0888, pruned_loss=0.01176, audio_tagging_loss=0.008682, over 3052805.64 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:17:04,554 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579700 2023-11-29 07:17:15,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3864720.0, ans=0.125 2023-11-29 07:17:36,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3864786.6666666665, ans=0.125 2023-11-29 07:17:49,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3864853.3333333335, ans=0.125 2023-11-29 07:17:49,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3864853.3333333335, ans=0.125 2023-11-29 07:17:53,581 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2023-11-29 07:18:02,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3864920.0, ans=0.0 2023-11-29 07:18:05,612 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2600, loss[loss=0.04521, simple_loss=0.06426, pruned_loss=0.00533, audio_tagging_loss=0.007752, over 14721.00 frames. ], tot_loss[loss=0.064, simple_loss=0.08774, pruned_loss=0.01149, audio_tagging_loss=0.008639, over 3043337.54 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:18:05,726 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579750 2023-11-29 07:18:36,166 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 8.765e+01 9.414e+01 9.856e+01 2.856e+02, threshold=1.883e+02, percent-clipped=1.0 2023-11-29 07:18:44,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3865186.6666666665, ans=0.125 2023-11-29 07:18:55,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3865253.3333333335, ans=0.125 2023-11-29 07:18:56,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3865253.3333333335, ans=0.0 2023-11-29 07:18:56,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3865253.3333333335, ans=0.0 2023-11-29 07:19:01,420 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:19:02,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3865253.3333333335, ans=0.125 2023-11-29 07:19:05,830 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2650, loss[loss=0.06921, simple_loss=0.09259, pruned_loss=0.01196, audio_tagging_loss=0.01096, over 15428.00 frames. ], tot_loss[loss=0.06418, simple_loss=0.08783, pruned_loss=0.01166, audio_tagging_loss=0.008607, over 3045178.26 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:19:05,979 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579800 2023-11-29 07:19:06,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3865320.0, ans=0.0 2023-11-29 07:19:11,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3865320.0, ans=0.0 2023-11-29 07:19:12,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3865320.0, ans=0.0 2023-11-29 07:19:31,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3865453.3333333335, ans=0.125 2023-11-29 07:19:34,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3865453.3333333335, ans=0.1 2023-11-29 07:20:01,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3865586.6666666665, ans=0.125 2023-11-29 07:20:06,917 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2700, loss[loss=0.07648, simple_loss=0.1076, pruned_loss=0.01625, audio_tagging_loss=0.006415, over 15930.00 frames. ], tot_loss[loss=0.06433, simple_loss=0.08824, pruned_loss=0.01167, audio_tagging_loss=0.00854, over 3045956.20 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:20:07,044 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579850 2023-11-29 07:20:08,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3865653.3333333335, ans=0.125 2023-11-29 07:20:10,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3865653.3333333335, ans=0.0 2023-11-29 07:20:13,242 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.29 vs. limit=12.0 2023-11-29 07:20:13,261 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.11 vs. limit=6.0 2023-11-29 07:20:14,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3865653.3333333335, ans=0.0 2023-11-29 07:20:36,847 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.953e+01 9.168e+01 9.728e+01 1.035e+02 1.379e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-29 07:21:07,825 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2750, loss[loss=0.05442, simple_loss=0.07202, pruned_loss=0.008366, audio_tagging_loss=0.01004, over 14687.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08874, pruned_loss=0.0117, audio_tagging_loss=0.008492, over 3047376.62 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:21:07,950 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579900 2023-11-29 07:21:09,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2023-11-29 07:21:10,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3865986.6666666665, ans=0.0 2023-11-29 07:21:12,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3865986.6666666665, ans=0.125 2023-11-29 07:21:34,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3866120.0, ans=0.125 2023-11-29 07:21:59,946 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:22:02,760 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.20 vs. limit=15.0 2023-11-29 07:22:05,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3866253.3333333335, ans=0.125 2023-11-29 07:22:08,104 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2800, loss[loss=0.04959, simple_loss=0.06538, pruned_loss=0.008017, audio_tagging_loss=0.008889, over 14815.00 frames. ], tot_loss[loss=0.06424, simple_loss=0.08814, pruned_loss=0.01168, audio_tagging_loss=0.008493, over 3040728.67 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:22:08,235 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579950 2023-11-29 07:22:08,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3866320.0, ans=0.125 2023-11-29 07:22:29,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3866386.6666666665, ans=15.0 2023-11-29 07:22:39,166 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 8.979e+01 9.442e+01 1.009e+02 1.188e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-29 07:22:41,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3866453.3333333335, ans=0.125 2023-11-29 07:22:47,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3866520.0, ans=0.125 2023-11-29 07:22:53,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3866520.0, ans=0.0 2023-11-29 07:23:09,380 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2850, loss[loss=0.05027, simple_loss=0.06481, pruned_loss=0.008712, audio_tagging_loss=0.00915, over 14239.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08858, pruned_loss=0.0119, audio_tagging_loss=0.008399, over 3040480.11 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:23:09,496 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 580000 2023-11-29 07:23:11,022 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-580000.pt 2023-11-29 07:23:45,599 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.33 vs. limit=22.5 2023-11-29 07:23:50,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3866853.3333333335, ans=0.1 2023-11-29 07:23:52,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3866853.3333333335, ans=0.09899494936611666 2023-11-29 07:24:13,546 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2900, loss[loss=0.08106, simple_loss=0.122, pruned_loss=0.01198, audio_tagging_loss=0.008102, over 15422.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08894, pruned_loss=0.01177, audio_tagging_loss=0.008445, over 3036769.21 frames. ], batch size: 52, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:24:13,651 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 580050 2023-11-29 07:24:27,050 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.56 vs. limit=15.0 2023-11-29 07:24:33,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3867053.3333333335, ans=0.125 2023-11-29 07:24:37,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3867120.0, ans=0.125 2023-11-29 07:24:41,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3867120.0, ans=0.07 2023-11-29 07:24:41,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3867120.0, ans=0.0 2023-11-29 07:24:44,584 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.265e+01 8.980e+01 9.788e+01 1.062e+02 1.550e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-29 07:24:57,108 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.26 vs. limit=22.5 2023-11-29 07:25:12,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.99 vs. limit=22.5 2023-11-29 07:25:14,060 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2950, loss[loss=0.06842, simple_loss=0.09437, pruned_loss=0.01288, audio_tagging_loss=0.008362, over 15124.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08943, pruned_loss=0.01177, audio_tagging_loss=0.008495, over 3039991.59 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:25:14,153 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 580100 2023-11-29 07:25:31,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3867386.6666666665, ans=0.2 2023-11-29 07:25:51,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3867520.0, ans=0.2 2023-11-29 07:26:04,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3867586.6666666665, ans=0.125 2023-11-29 07:26:15,571 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 3000, loss[loss=0.06918, simple_loss=0.08953, pruned_loss=0.0119, audio_tagging_loss=0.01252, over 14709.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.0888, pruned_loss=0.01169, audio_tagging_loss=0.008686, over 3041191.84 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:26:15,574 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-29 07:26:37,779 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.6184, 3.1097, 3.2490, 2.7309], device='cuda:0') 2023-11-29 07:26:50,880 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5257, 3.5782, 3.7271, 3.7488], device='cuda:0') 2023-11-29 07:26:54,602 INFO [train_asr.py:1267] (0/4) Epoch 49, validation: loss=0.05747, simple_loss=0.05054, pruned_loss=0.005474, audio_tagging_loss=0.02673, over 4681554.00 frames. 2023-11-29 07:26:54,602 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-29 07:26:54,692 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 580150 2023-11-29 07:26:59,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3867653.3333333335, ans=0.0 2023-11-29 07:27:08,003 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2023-11-29 07:27:09,241 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.53 vs. limit=22.5 2023-11-29 07:27:14,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3867720.0, ans=0.125 2023-11-29 07:27:20,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3867786.6666666665, ans=0.125 2023-11-29 07:27:26,157 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 9.023e+01 9.601e+01 1.027e+02 1.356e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 07:27:46,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3867920.0, ans=0.125 2023-11-29 07:27:55,402 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 3050, loss[loss=0.05782, simple_loss=0.07261, pruned_loss=0.01316, audio_tagging_loss=0.008358, over 14561.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08858, pruned_loss=0.01154, audio_tagging_loss=0.008583, over 3041229.15 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:27:55,507 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 580200 2023-11-29 07:28:04,334 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.81 vs. limit=15.0 2023-11-29 07:28:07,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3868053.3333333335, ans=0.125 2023-11-29 07:28:09,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.15 vs. limit=12.0 2023-11-29 07:28:14,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3868053.3333333335, ans=0.0 2023-11-29 07:28:21,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3868120.0, ans=0.5 2023-11-29 07:28:28,295 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.90 vs. limit=15.0 2023-11-29 07:28:32,418 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:28:43,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3868186.6666666665, ans=0.1 2023-11-29 07:28:52,751 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.27 vs. limit=5.0 2023-11-29 07:28:57,702 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 3100, loss[loss=0.07962, simple_loss=0.122, pruned_loss=0.0131, audio_tagging_loss=0.005502, over 16606.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08923, pruned_loss=0.01176, audio_tagging_loss=0.008524, over 3049121.24 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:28:57,842 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 580250 2023-11-29 07:29:03,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3868320.0, ans=0.125 2023-11-29 07:29:09,997 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0 2023-11-29 07:29:15,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3868386.6666666665, ans=0.125 2023-11-29 07:29:29,753 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 8.858e+01 9.570e+01 1.021e+02 1.337e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 07:29:36,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3868520.0, ans=0.125 2023-11-29 07:29:58,658 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:29:59,551 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 3150, loss[loss=0.0803, simple_loss=0.1201, pruned_loss=0.01413, audio_tagging_loss=0.006119, over 16111.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08962, pruned_loss=0.01191, audio_tagging_loss=0.008586, over 3043333.72 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:29:59,660 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 580300 2023-11-29 07:30:11,483 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.16 vs. limit=10.0 2023-11-29 07:30:16,064 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.89 vs. limit=22.5 2023-11-29 07:30:21,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3868720.0, ans=0.125 2023-11-29 07:30:21,263 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.17 vs. limit=22.5 2023-11-29 07:30:36,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3868853.3333333335, ans=0.125 2023-11-29 07:30:44,740 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2023-11-29 07:30:57,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3868920.0, ans=0.125 2023-11-29 07:30:58,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3868920.0, ans=0.0 2023-11-29 07:31:01,048 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 3200, loss[loss=0.05404, simple_loss=0.06912, pruned_loss=0.007683, audio_tagging_loss=0.01179, over 14652.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.0893, pruned_loss=0.01189, audio_tagging_loss=0.008687, over 3039459.95 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:31:01,151 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 580350 2023-11-29 07:31:17,188 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.28 vs. limit=15.0 2023-11-29 07:31:19,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3869053.3333333335, ans=0.0 2023-11-29 07:31:33,096 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.044e+01 8.935e+01 9.459e+01 1.020e+02 1.289e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-29 07:31:51,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3869253.3333333335, ans=0.0 2023-11-29 07:32:02,188 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 3250, loss[loss=0.0698, simple_loss=0.09431, pruned_loss=0.01244, audio_tagging_loss=0.01021, over 14627.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08929, pruned_loss=0.01191, audio_tagging_loss=0.008721, over 3047571.46 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:32:02,280 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 580400 2023-11-29 07:32:15,878 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.92 vs. limit=22.5 2023-11-29 07:32:51,340 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.43 vs. limit=15.0 2023-11-29 07:33:01,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3869586.6666666665, ans=0.2 2023-11-29 07:33:02,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3869586.6666666665, ans=0.125 2023-11-29 07:33:04,501 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 3300, loss[loss=0.06474, simple_loss=0.07824, pruned_loss=0.01509, audio_tagging_loss=0.01053, over 14889.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08921, pruned_loss=0.01208, audio_tagging_loss=0.008769, over 3045030.76 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:33:04,594 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 580450 2023-11-29 07:33:05,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3869653.3333333335, ans=0.1 2023-11-29 07:33:09,802 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.37 vs. limit=15.0 2023-11-29 07:33:17,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3869720.0, ans=0.0 2023-11-29 07:33:37,735 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.887e+01 8.902e+01 9.466e+01 1.005e+02 1.164e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-29 07:33:53,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3869920.0, ans=0.125 2023-11-29 07:34:06,870 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 3350, loss[loss=0.05496, simple_loss=0.06852, pruned_loss=0.01172, audio_tagging_loss=0.008981, over 14651.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08926, pruned_loss=0.01198, audio_tagging_loss=0.008657, over 3035205.99 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:34:06,963 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 580500 2023-11-29 07:34:29,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3870053.3333333335, ans=0.025 2023-11-29 07:34:30,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3870120.0, ans=0.0 2023-11-29 07:35:08,987 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 3400, loss[loss=0.06853, simple_loss=0.09427, pruned_loss=0.014, audio_tagging_loss=0.007399, over 15801.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08934, pruned_loss=0.012, audio_tagging_loss=0.008519, over 3037698.54 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:35:09,092 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 580550 2023-11-29 07:35:15,336 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2023-11-29 07:35:17,574 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.96 vs. limit=15.0 2023-11-29 07:35:25,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3870386.6666666665, ans=0.0 2023-11-29 07:35:25,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3870386.6666666665, ans=0.0 2023-11-29 07:35:41,958 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 9.012e+01 9.460e+01 1.056e+02 1.309e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-29 07:35:44,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3870453.3333333335, ans=0.0 2023-11-29 07:35:44,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3870453.3333333335, ans=0.125 2023-11-29 07:35:45,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3870520.0, ans=0.125 2023-11-29 07:35:47,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3870520.0, ans=0.0 2023-11-29 07:35:53,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3870520.0, ans=0.0 2023-11-29 07:36:11,817 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 3450, loss[loss=0.0691, simple_loss=0.09257, pruned_loss=0.01219, audio_tagging_loss=0.01063, over 15903.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08889, pruned_loss=0.01188, audio_tagging_loss=0.008525, over 3038788.07 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:36:11,912 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 580600 2023-11-29 07:36:23,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3870720.0, ans=0.1 2023-11-29 07:36:26,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3870720.0, ans=0.0 2023-11-29 07:36:42,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3870786.6666666665, ans=0.0 2023-11-29 07:36:42,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3870786.6666666665, ans=0.125 2023-11-29 07:36:54,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3870853.3333333335, ans=0.0 2023-11-29 07:37:04,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.10 vs. limit=10.0 2023-11-29 07:37:13,492 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 3500, loss[loss=0.06367, simple_loss=0.09115, pruned_loss=0.01211, audio_tagging_loss=0.005986, over 15869.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08901, pruned_loss=0.01197, audio_tagging_loss=0.008448, over 3038629.97 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:37:13,591 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 580650 2023-11-29 07:37:41,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3871120.0, ans=0.2 2023-11-29 07:37:47,394 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:37:48,508 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 8.986e+01 9.811e+01 1.065e+02 1.473e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 07:37:57,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3871186.6666666665, ans=0.125 2023-11-29 07:38:00,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3871186.6666666665, ans=0.125 2023-11-29 07:38:08,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3871253.3333333335, ans=0.2 2023-11-29 07:38:16,979 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 3550, loss[loss=0.0561, simple_loss=0.07649, pruned_loss=0.01149, audio_tagging_loss=0.00636, over 15232.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08889, pruned_loss=0.01194, audio_tagging_loss=0.008408, over 3034670.73 frames. ], batch size: 62, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:38:17,081 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 580700 2023-11-29 07:38:41,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3871453.3333333335, ans=0.125 2023-11-29 07:38:52,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3871520.0, ans=0.125 2023-11-29 07:38:54,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3871520.0, ans=0.2 2023-11-29 07:38:59,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.82 vs. limit=22.5 2023-11-29 07:39:18,696 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 3600, loss[loss=0.07512, simple_loss=0.09977, pruned_loss=0.01558, audio_tagging_loss=0.009652, over 15187.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08843, pruned_loss=0.01189, audio_tagging_loss=0.008456, over 3035310.85 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:39:18,803 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 580750 2023-11-29 07:39:45,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3871786.6666666665, ans=0.09899494936611666 2023-11-29 07:39:51,830 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 8.727e+01 9.343e+01 1.017e+02 1.458e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-29 07:40:02,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3871853.3333333335, ans=0.2 2023-11-29 07:40:19,984 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 3650, loss[loss=0.05683, simple_loss=0.07367, pruned_loss=0.009615, audio_tagging_loss=0.01038, over 13781.00 frames. ], tot_loss[loss=0.06438, simple_loss=0.08814, pruned_loss=0.01186, audio_tagging_loss=0.00845, over 3032522.31 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:40:20,122 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 580800 2023-11-29 07:40:48,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3872120.0, ans=0.0 2023-11-29 07:40:53,846 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.10 vs. limit=22.5 2023-11-29 07:41:03,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.59 vs. limit=22.5 2023-11-29 07:41:19,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3872253.3333333335, ans=0.2 2023-11-29 07:41:21,619 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 3700, loss[loss=0.06235, simple_loss=0.09572, pruned_loss=0.009408, audio_tagging_loss=0.005087, over 15254.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08849, pruned_loss=0.01186, audio_tagging_loss=0.008373, over 3036860.83 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:41:21,722 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 580850 2023-11-29 07:41:51,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3872453.3333333335, ans=0.1 2023-11-29 07:41:56,228 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 9.236e+01 9.960e+01 1.067e+02 1.392e+02, threshold=1.992e+02, percent-clipped=0.0 2023-11-29 07:42:05,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3872520.0, ans=0.1 2023-11-29 07:42:24,380 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 3750, loss[loss=0.0708, simple_loss=0.1039, pruned_loss=0.009949, audio_tagging_loss=0.008923, over 15162.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08889, pruned_loss=0.01177, audio_tagging_loss=0.008423, over 3043791.88 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:42:24,529 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 580900 2023-11-29 07:42:40,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3872720.0, ans=0.125 2023-11-29 07:42:46,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3872720.0, ans=0.0 2023-11-29 07:43:03,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3872853.3333333335, ans=0.125 2023-11-29 07:43:09,411 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:43:12,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3872853.3333333335, ans=0.2 2023-11-29 07:43:26,298 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 3800, loss[loss=0.05816, simple_loss=0.07198, pruned_loss=0.01141, audio_tagging_loss=0.01076, over 16209.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08875, pruned_loss=0.01167, audio_tagging_loss=0.008533, over 3046764.97 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:43:26,378 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 580950 2023-11-29 07:43:38,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3873053.3333333335, ans=0.0 2023-11-29 07:43:41,462 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=15.0 2023-11-29 07:43:59,220 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.61 vs. limit=10.0 2023-11-29 07:44:01,538 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.972e+01 9.513e+01 1.036e+02 1.364e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-29 07:44:04,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3873186.6666666665, ans=0.0 2023-11-29 07:44:17,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3873253.3333333335, ans=0.0 2023-11-29 07:44:22,985 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.86 vs. limit=22.5 2023-11-29 07:44:23,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3873253.3333333335, ans=0.04949747468305833 2023-11-29 07:44:28,056 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 3850, loss[loss=0.07306, simple_loss=0.1113, pruned_loss=0.01104, audio_tagging_loss=0.006358, over 15347.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08936, pruned_loss=0.01182, audio_tagging_loss=0.008605, over 3050122.08 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:44:28,178 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 581000 2023-11-29 07:45:07,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3873520.0, ans=0.0 2023-11-29 07:45:10,923 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:45:22,248 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.63 vs. limit=15.0 2023-11-29 07:45:29,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3873653.3333333335, ans=0.125 2023-11-29 07:45:30,904 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 3900, loss[loss=0.07481, simple_loss=0.1022, pruned_loss=0.01321, audio_tagging_loss=0.01051, over 14731.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08953, pruned_loss=0.01164, audio_tagging_loss=0.008582, over 3049735.96 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:45:31,010 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 581050 2023-11-29 07:45:38,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3873653.3333333335, ans=0.125 2023-11-29 07:45:51,731 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:46:04,756 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.794e+01 9.412e+01 1.023e+02 1.323e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-29 07:46:04,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3873786.6666666665, ans=0.015 2023-11-29 07:46:31,878 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 3950, loss[loss=0.08039, simple_loss=0.1174, pruned_loss=0.01409, audio_tagging_loss=0.007597, over 15598.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08872, pruned_loss=0.01172, audio_tagging_loss=0.008681, over 3052789.34 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:46:31,991 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 581100 2023-11-29 07:46:39,546 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2023-11-29 07:46:50,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3874053.3333333335, ans=0.125 2023-11-29 07:47:18,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3874186.6666666665, ans=0.0 2023-11-29 07:47:25,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3874253.3333333335, ans=0.125 2023-11-29 07:47:25,653 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-29 07:47:32,142 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 4000, loss[loss=0.07706, simple_loss=0.1093, pruned_loss=0.01263, audio_tagging_loss=0.009795, over 16292.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08927, pruned_loss=0.01183, audio_tagging_loss=0.008731, over 3052218.72 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:47:32,265 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 581150 2023-11-29 07:47:56,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3874453.3333333335, ans=0.125 2023-11-29 07:48:08,329 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 9.122e+01 9.589e+01 1.060e+02 2.170e+02, threshold=1.918e+02, percent-clipped=1.0 2023-11-29 07:48:31,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3874586.6666666665, ans=0.125 2023-11-29 07:48:31,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2023-11-29 07:48:33,331 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 4050, loss[loss=0.06673, simple_loss=0.0983, pruned_loss=0.01117, audio_tagging_loss=0.00641, over 16824.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08969, pruned_loss=0.01201, audio_tagging_loss=0.008726, over 3049788.54 frames. ], batch size: 62, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:48:33,441 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 581200 2023-11-29 07:48:36,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3874653.3333333335, ans=0.125 2023-11-29 07:48:37,479 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:48:43,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3874653.3333333335, ans=0.125 2023-11-29 07:48:44,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3874653.3333333335, ans=0.1 2023-11-29 07:49:04,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3874786.6666666665, ans=0.125 2023-11-29 07:49:15,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3874853.3333333335, ans=0.125 2023-11-29 07:49:15,821 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=12.0 2023-11-29 07:49:35,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.47 vs. limit=22.5 2023-11-29 07:49:35,759 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 4100, loss[loss=0.07729, simple_loss=0.1076, pruned_loss=0.0175, audio_tagging_loss=0.005997, over 15348.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08968, pruned_loss=0.01199, audio_tagging_loss=0.008762, over 3045550.56 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:49:35,895 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 581250 2023-11-29 07:49:36,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.82 vs. limit=10.0 2023-11-29 07:49:46,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3875053.3333333335, ans=0.125 2023-11-29 07:50:00,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3875120.0, ans=0.2 2023-11-29 07:50:11,081 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.045e+01 9.112e+01 9.700e+01 1.031e+02 1.226e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 07:50:29,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3875253.3333333335, ans=0.125 2023-11-29 07:50:36,474 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 4150, loss[loss=0.05385, simple_loss=0.06916, pruned_loss=0.0106, audio_tagging_loss=0.008664, over 14248.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08929, pruned_loss=0.01203, audio_tagging_loss=0.008594, over 3047332.76 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:50:36,571 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 581300 2023-11-29 07:50:39,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3875320.0, ans=0.125 2023-11-29 07:50:57,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3875386.6666666665, ans=0.1 2023-11-29 07:51:05,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3875453.3333333335, ans=0.1 2023-11-29 07:51:07,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3875453.3333333335, ans=0.0 2023-11-29 07:51:09,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3875453.3333333335, ans=0.125 2023-11-29 07:51:22,083 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:51:25,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3875586.6666666665, ans=0.125 2023-11-29 07:51:31,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3875586.6666666665, ans=0.125 2023-11-29 07:51:37,799 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 4200, loss[loss=0.0563, simple_loss=0.07854, pruned_loss=0.009046, audio_tagging_loss=0.007986, over 15037.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08983, pruned_loss=0.01193, audio_tagging_loss=0.008479, over 3052821.56 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:51:37,892 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 581350 2023-11-29 07:51:44,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3875653.3333333335, ans=0.125 2023-11-29 07:52:00,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3875720.0, ans=0.2 2023-11-29 07:52:05,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3875786.6666666665, ans=0.0 2023-11-29 07:52:10,517 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.28 vs. limit=15.0 2023-11-29 07:52:13,212 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 9.081e+01 9.650e+01 1.017e+02 1.202e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 07:52:16,291 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=15.0 2023-11-29 07:52:17,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3875853.3333333335, ans=0.125 2023-11-29 07:52:37,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3875920.0, ans=0.2 2023-11-29 07:52:39,540 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 4250, loss[loss=0.0753, simple_loss=0.1057, pruned_loss=0.01328, audio_tagging_loss=0.00916, over 14841.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.09018, pruned_loss=0.01203, audio_tagging_loss=0.008416, over 3058487.70 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:52:39,685 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 581400 2023-11-29 07:53:00,713 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:53:29,666 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2023-11-29 07:53:34,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3876253.3333333335, ans=0.1 2023-11-29 07:53:41,502 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 4300, loss[loss=0.05675, simple_loss=0.078, pruned_loss=0.009958, audio_tagging_loss=0.00779, over 15704.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09164, pruned_loss=0.0122, audio_tagging_loss=0.008351, over 3056555.59 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:53:41,608 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 581450 2023-11-29 07:54:16,697 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 9.277e+01 9.932e+01 1.054e+02 1.240e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-29 07:54:28,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3876520.0, ans=0.125 2023-11-29 07:54:42,953 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 4350, loss[loss=0.0641, simple_loss=0.09387, pruned_loss=0.01018, audio_tagging_loss=0.006979, over 15204.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09136, pruned_loss=0.01216, audio_tagging_loss=0.00837, over 3055038.67 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:54:43,071 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 581500 2023-11-29 07:54:43,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3876653.3333333335, ans=0.04949747468305833 2023-11-29 07:54:45,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3876653.3333333335, ans=0.125 2023-11-29 07:54:52,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3876653.3333333335, ans=0.0 2023-11-29 07:54:56,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3876720.0, ans=0.125 2023-11-29 07:55:05,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3876720.0, ans=0.04949747468305833 2023-11-29 07:55:22,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3876853.3333333335, ans=0.125 2023-11-29 07:55:44,997 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 4400, loss[loss=0.06348, simple_loss=0.08928, pruned_loss=0.01118, audio_tagging_loss=0.007655, over 14394.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09077, pruned_loss=0.01209, audio_tagging_loss=0.0084, over 3063138.26 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:55:45,119 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 581550 2023-11-29 07:56:02,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3877053.3333333335, ans=0.125 2023-11-29 07:56:03,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.83 vs. limit=15.0 2023-11-29 07:56:08,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3877120.0, ans=0.09899494936611666 2023-11-29 07:56:21,163 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.065e+01 9.242e+01 9.842e+01 1.066e+02 1.310e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-29 07:56:25,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3877186.6666666665, ans=0.07 2023-11-29 07:56:36,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3877253.3333333335, ans=0.125 2023-11-29 07:56:46,475 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 4450, loss[loss=0.06949, simple_loss=0.09963, pruned_loss=0.01116, audio_tagging_loss=0.008512, over 13509.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.09011, pruned_loss=0.01197, audio_tagging_loss=0.008389, over 3060202.24 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:56:46,609 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 581600 2023-11-29 07:57:00,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3877386.6666666665, ans=0.2 2023-11-29 07:57:11,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3877453.3333333335, ans=0.1 2023-11-29 07:57:17,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3877453.3333333335, ans=0.0 2023-11-29 07:57:29,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3877520.0, ans=0.0 2023-11-29 07:57:48,371 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 4500, loss[loss=0.0538, simple_loss=0.0703, pruned_loss=0.01135, audio_tagging_loss=0.0073, over 14549.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08934, pruned_loss=0.01187, audio_tagging_loss=0.008478, over 3057751.33 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:57:48,462 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 581650 2023-11-29 07:57:49,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3877653.3333333335, ans=0.125 2023-11-29 07:57:56,243 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.21 vs. limit=15.0 2023-11-29 07:58:25,208 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 9.167e+01 9.852e+01 1.040e+02 1.276e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 07:58:42,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3877920.0, ans=0.125 2023-11-29 07:58:43,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3877920.0, ans=0.125 2023-11-29 07:58:50,538 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 4550, loss[loss=0.07418, simple_loss=0.09785, pruned_loss=0.01695, audio_tagging_loss=0.008311, over 15562.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08909, pruned_loss=0.01168, audio_tagging_loss=0.008484, over 3055170.37 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:58:50,656 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 581700 2023-11-29 07:59:17,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3878120.0, ans=0.2 2023-11-29 07:59:36,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3878186.6666666665, ans=0.125 2023-11-29 07:59:38,674 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:59:40,660 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.32 vs. limit=6.0 2023-11-29 07:59:42,794 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.93 vs. limit=10.0 2023-11-29 07:59:44,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3878253.3333333335, ans=0.1 2023-11-29 07:59:50,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3878320.0, ans=0.125 2023-11-29 07:59:51,485 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 4600, loss[loss=0.06982, simple_loss=0.08975, pruned_loss=0.02017, audio_tagging_loss=0.004777, over 14699.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08858, pruned_loss=0.01172, audio_tagging_loss=0.008503, over 3048228.86 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:59:51,610 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 581750 2023-11-29 08:00:04,469 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2023-11-29 08:00:04,656 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2023-11-29 08:00:10,168 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.00 vs. limit=15.0 2023-11-29 08:00:21,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3878453.3333333335, ans=0.0 2023-11-29 08:00:29,080 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.065e+01 8.974e+01 9.623e+01 1.050e+02 1.439e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-29 08:00:46,797 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.17 vs. limit=22.5 2023-11-29 08:00:53,005 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 4650, loss[loss=0.07135, simple_loss=0.09961, pruned_loss=0.01338, audio_tagging_loss=0.00817, over 15703.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08915, pruned_loss=0.01173, audio_tagging_loss=0.008564, over 3055355.80 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:00:53,109 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 581800 2023-11-29 08:00:59,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3878653.3333333335, ans=0.2 2023-11-29 08:01:09,280 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.13 vs. limit=15.0 2023-11-29 08:01:56,923 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 4700, loss[loss=0.07522, simple_loss=0.1003, pruned_loss=0.01826, audio_tagging_loss=0.006795, over 14627.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08961, pruned_loss=0.01193, audio_tagging_loss=0.008664, over 3049722.64 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:01:57,060 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 581850 2023-11-29 08:02:02,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3878986.6666666665, ans=0.125 2023-11-29 08:02:03,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.69 vs. limit=15.0 2023-11-29 08:02:18,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3879053.3333333335, ans=0.05 2023-11-29 08:02:33,811 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 9.091e+01 9.646e+01 1.031e+02 1.253e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 08:02:34,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3879186.6666666665, ans=0.0 2023-11-29 08:02:49,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3879253.3333333335, ans=0.125 2023-11-29 08:02:52,285 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2023-11-29 08:02:58,759 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 4750, loss[loss=0.0767, simple_loss=0.1056, pruned_loss=0.01523, audio_tagging_loss=0.00869, over 14956.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.089, pruned_loss=0.01183, audio_tagging_loss=0.008678, over 3052294.39 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:02:58,906 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 581900 2023-11-29 08:03:00,398 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.43 vs. limit=10.0 2023-11-29 08:03:12,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=12.0 2023-11-29 08:03:12,250 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.98 vs. limit=15.0 2023-11-29 08:03:29,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3879453.3333333335, ans=0.2 2023-11-29 08:03:48,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3879586.6666666665, ans=0.125 2023-11-29 08:03:53,060 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2023-11-29 08:03:57,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3879586.6666666665, ans=0.125 2023-11-29 08:03:59,278 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 4800, loss[loss=0.06712, simple_loss=0.1031, pruned_loss=0.008927, audio_tagging_loss=0.006629, over 15652.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.0893, pruned_loss=0.01188, audio_tagging_loss=0.008736, over 3047355.26 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:03:59,462 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 581950 2023-11-29 08:04:00,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3879653.3333333335, ans=0.1 2023-11-29 08:04:04,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3879653.3333333335, ans=0.2 2023-11-29 08:04:26,109 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:04:27,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3879786.6666666665, ans=0.1 2023-11-29 08:04:28,928 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.15 vs. limit=15.0 2023-11-29 08:04:36,438 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.055e+01 9.178e+01 9.692e+01 1.041e+02 1.280e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 08:04:53,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3879920.0, ans=0.125 2023-11-29 08:05:01,411 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 4850, loss[loss=0.06536, simple_loss=0.09254, pruned_loss=0.01185, audio_tagging_loss=0.007237, over 15245.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08942, pruned_loss=0.01179, audio_tagging_loss=0.008792, over 3042061.13 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:05:01,501 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 582000 2023-11-29 08:05:03,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3879986.6666666665, ans=0.125 2023-11-29 08:05:12,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3879986.6666666665, ans=0.07 2023-11-29 08:05:25,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3880120.0, ans=0.1 2023-11-29 08:05:32,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3880120.0, ans=0.0 2023-11-29 08:06:04,480 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 4900, loss[loss=0.07745, simple_loss=0.1113, pruned_loss=0.01576, audio_tagging_loss=0.006062, over 15140.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08985, pruned_loss=0.01188, audio_tagging_loss=0.008792, over 3042082.64 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:06:04,596 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 582050 2023-11-29 08:06:10,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3880320.0, ans=0.125 2023-11-29 08:06:39,511 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.87 vs. limit=10.0 2023-11-29 08:06:41,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3880520.0, ans=0.1 2023-11-29 08:06:43,225 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.902e+01 9.348e+01 9.931e+01 1.050e+02 1.310e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-29 08:06:45,064 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.08 vs. limit=6.0 2023-11-29 08:07:02,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3880586.6666666665, ans=0.0 2023-11-29 08:07:05,354 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 4950, loss[loss=0.05757, simple_loss=0.08108, pruned_loss=0.007526, audio_tagging_loss=0.0095, over 14916.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08997, pruned_loss=0.01182, audio_tagging_loss=0.00862, over 3038721.20 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:07:05,486 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 582100 2023-11-29 08:07:34,445 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.55 vs. limit=10.0 2023-11-29 08:07:36,711 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.18 vs. limit=22.5 2023-11-29 08:08:07,542 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 5000, loss[loss=0.07604, simple_loss=0.1072, pruned_loss=0.01582, audio_tagging_loss=0.006608, over 15132.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08942, pruned_loss=0.01168, audio_tagging_loss=0.008426, over 3036659.90 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:08:07,672 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 582150 2023-11-29 08:08:25,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3881053.3333333335, ans=0.1 2023-11-29 08:08:27,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3881053.3333333335, ans=0.2 2023-11-29 08:08:36,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3881120.0, ans=0.035 2023-11-29 08:08:45,871 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.750e+01 9.159e+01 9.676e+01 1.038e+02 1.226e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-29 08:08:56,498 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:09:10,357 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 5050, loss[loss=0.07986, simple_loss=0.1142, pruned_loss=0.01441, audio_tagging_loss=0.008335, over 15216.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08915, pruned_loss=0.0116, audio_tagging_loss=0.008411, over 3038924.76 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:09:10,458 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 582200 2023-11-29 08:10:10,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3881653.3333333335, ans=0.125 2023-11-29 08:10:11,789 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 5100, loss[loss=0.0552, simple_loss=0.06826, pruned_loss=0.009937, audio_tagging_loss=0.01113, over 15289.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08855, pruned_loss=0.01171, audio_tagging_loss=0.008464, over 3040436.82 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:10:11,957 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 582250 2023-11-29 08:10:26,338 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.66 vs. limit=22.5 2023-11-29 08:10:49,746 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.896e+01 8.838e+01 9.435e+01 1.031e+02 1.429e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-29 08:11:13,114 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 5150, loss[loss=0.06035, simple_loss=0.07838, pruned_loss=0.01299, audio_tagging_loss=0.008173, over 17080.00 frames. ], tot_loss[loss=0.06436, simple_loss=0.08857, pruned_loss=0.01158, audio_tagging_loss=0.008497, over 3042949.39 frames. ], batch size: 64, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:11:13,218 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 582300 2023-11-29 08:11:23,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3881986.6666666665, ans=0.125 2023-11-29 08:11:23,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3881986.6666666665, ans=0.07 2023-11-29 08:11:34,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3882053.3333333335, ans=0.125 2023-11-29 08:12:15,498 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 5200, loss[loss=0.06992, simple_loss=0.09176, pruned_loss=0.0162, audio_tagging_loss=0.007838, over 15343.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08856, pruned_loss=0.01163, audio_tagging_loss=0.008525, over 3047027.98 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:12:15,610 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 582350 2023-11-29 08:12:15,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3882320.0, ans=0.125 2023-11-29 08:12:19,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3882320.0, ans=15.0 2023-11-29 08:12:20,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3882320.0, ans=0.07 2023-11-29 08:12:30,024 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.72 vs. limit=15.0 2023-11-29 08:12:43,475 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.09 vs. limit=15.0 2023-11-29 08:12:52,839 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.957e+01 8.958e+01 9.640e+01 1.041e+02 1.476e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-29 08:12:55,302 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.89 vs. limit=15.0 2023-11-29 08:13:03,812 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.40 vs. limit=22.5 2023-11-29 08:13:04,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3882586.6666666665, ans=0.0 2023-11-29 08:13:16,408 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 5250, loss[loss=0.03442, simple_loss=0.04463, pruned_loss=0.003205, audio_tagging_loss=0.008901, over 15355.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08829, pruned_loss=0.01162, audio_tagging_loss=0.008528, over 3041980.07 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:13:16,546 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 582400 2023-11-29 08:13:21,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3882653.3333333335, ans=0.125 2023-11-29 08:13:37,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3882720.0, ans=0.125 2023-11-29 08:13:53,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3882853.3333333335, ans=0.125 2023-11-29 08:14:03,254 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2023-11-29 08:14:18,854 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 5300, loss[loss=0.0506, simple_loss=0.07024, pruned_loss=0.007542, audio_tagging_loss=0.007934, over 14826.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.089, pruned_loss=0.01177, audio_tagging_loss=0.008441, over 3039875.35 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:14:18,952 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 582450 2023-11-29 08:14:35,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3883053.3333333335, ans=0.125 2023-11-29 08:14:40,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3883053.3333333335, ans=0.125 2023-11-29 08:14:41,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3883053.3333333335, ans=0.125 2023-11-29 08:14:44,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3883120.0, ans=0.125 2023-11-29 08:14:57,525 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.980e+01 9.053e+01 9.676e+01 1.034e+02 1.415e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-29 08:15:20,469 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 5350, loss[loss=0.07679, simple_loss=0.1113, pruned_loss=0.01719, audio_tagging_loss=0.003957, over 14672.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08886, pruned_loss=0.0118, audio_tagging_loss=0.00834, over 3045518.57 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:15:20,579 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 582500 2023-11-29 08:15:23,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3883320.0, ans=0.1 2023-11-29 08:15:34,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3883386.6666666665, ans=0.0 2023-11-29 08:15:41,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3883386.6666666665, ans=0.125 2023-11-29 08:15:53,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3883453.3333333335, ans=0.2 2023-11-29 08:16:07,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3883520.0, ans=0.0 2023-11-29 08:16:18,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3883586.6666666665, ans=0.1 2023-11-29 08:16:21,974 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 5400, loss[loss=0.08182, simple_loss=0.1163, pruned_loss=0.01607, audio_tagging_loss=0.007607, over 15131.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08936, pruned_loss=0.01187, audio_tagging_loss=0.008446, over 3048865.12 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:16:22,071 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 582550 2023-11-29 08:16:25,973 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=22.5 2023-11-29 08:16:26,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3883653.3333333335, ans=0.07 2023-11-29 08:16:29,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3883653.3333333335, ans=0.125 2023-11-29 08:16:34,728 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=15.0 2023-11-29 08:16:39,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=3883720.0, ans=12.0 2023-11-29 08:17:01,374 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.077e+01 9.215e+01 9.741e+01 1.047e+02 1.328e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-29 08:17:01,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3883853.3333333335, ans=0.0 2023-11-29 08:17:02,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3883853.3333333335, ans=0.125 2023-11-29 08:17:05,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3883853.3333333335, ans=0.2 2023-11-29 08:17:23,108 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 5450, loss[loss=0.06816, simple_loss=0.09039, pruned_loss=0.0137, audio_tagging_loss=0.009264, over 14974.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08932, pruned_loss=0.01198, audio_tagging_loss=0.008557, over 3046844.69 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:17:23,198 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 582600 2023-11-29 08:17:54,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=3884120.0, ans=12.0 2023-11-29 08:18:06,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3884186.6666666665, ans=0.07 2023-11-29 08:18:08,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3884186.6666666665, ans=0.125 2023-11-29 08:18:19,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3884253.3333333335, ans=0.125 2023-11-29 08:18:24,681 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 5500, loss[loss=0.04854, simple_loss=0.06482, pruned_loss=0.007647, audio_tagging_loss=0.008488, over 14673.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08965, pruned_loss=0.01199, audio_tagging_loss=0.008558, over 3045274.71 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:18:24,788 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 582650 2023-11-29 08:18:26,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3884320.0, ans=0.125 2023-11-29 08:18:39,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3884386.6666666665, ans=0.025 2023-11-29 08:18:46,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3884386.6666666665, ans=0.125 2023-11-29 08:18:59,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3884453.3333333335, ans=0.2 2023-11-29 08:19:03,451 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 9.075e+01 9.683e+01 1.060e+02 1.497e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-29 08:19:05,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.25 vs. limit=15.0 2023-11-29 08:19:12,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3884586.6666666665, ans=0.1 2023-11-29 08:19:24,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=3884653.3333333335, ans=0.02 2023-11-29 08:19:25,592 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 5550, loss[loss=0.07333, simple_loss=0.1025, pruned_loss=0.01333, audio_tagging_loss=0.008767, over 15259.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.0899, pruned_loss=0.012, audio_tagging_loss=0.008574, over 3043440.99 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:19:25,700 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 582700 2023-11-29 08:19:29,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3884653.3333333335, ans=0.0 2023-11-29 08:19:32,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3884653.3333333335, ans=0.125 2023-11-29 08:19:33,148 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2023-11-29 08:19:36,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3884720.0, ans=0.125 2023-11-29 08:19:43,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3884720.0, ans=0.125 2023-11-29 08:19:56,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3884786.6666666665, ans=0.0 2023-11-29 08:20:13,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3884920.0, ans=0.125 2023-11-29 08:20:24,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3884920.0, ans=0.2 2023-11-29 08:20:26,544 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 5600, loss[loss=0.06628, simple_loss=0.08981, pruned_loss=0.01135, audio_tagging_loss=0.01003, over 15433.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09015, pruned_loss=0.01202, audio_tagging_loss=0.008688, over 3049921.01 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:20:26,656 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 582750 2023-11-29 08:21:05,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3885186.6666666665, ans=0.1 2023-11-29 08:21:06,733 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.841e+01 9.122e+01 9.786e+01 1.074e+02 1.432e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-29 08:21:10,460 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 08:21:28,649 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 5650, loss[loss=0.055, simple_loss=0.07191, pruned_loss=0.01113, audio_tagging_loss=0.007918, over 13685.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.0891, pruned_loss=0.01192, audio_tagging_loss=0.008831, over 3055472.01 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:21:28,725 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 582800 2023-11-29 08:21:38,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3885320.0, ans=0.0 2023-11-29 08:21:58,063 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:22:11,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3885520.0, ans=0.0 2023-11-29 08:22:14,525 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:22:26,903 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.21 vs. limit=12.0 2023-11-29 08:22:29,908 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 5700, loss[loss=0.08172, simple_loss=0.1103, pruned_loss=0.02133, audio_tagging_loss=0.005259, over 15340.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08885, pruned_loss=0.01202, audio_tagging_loss=0.008823, over 3057113.11 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:22:30,006 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 582850 2023-11-29 08:22:33,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3885653.3333333335, ans=0.125 2023-11-29 08:22:50,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-29 08:22:58,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3885786.6666666665, ans=0.125 2023-11-29 08:23:04,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3885786.6666666665, ans=0.0 2023-11-29 08:23:10,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3885853.3333333335, ans=0.0 2023-11-29 08:23:10,812 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 9.240e+01 1.005e+02 1.081e+02 1.357e+02, threshold=2.009e+02, percent-clipped=0.0 2023-11-29 08:23:31,321 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 5750, loss[loss=0.0731, simple_loss=0.1019, pruned_loss=0.012, audio_tagging_loss=0.01015, over 15894.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08938, pruned_loss=0.01205, audio_tagging_loss=0.008646, over 3056675.21 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:23:31,449 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 582900 2023-11-29 08:23:34,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3885986.6666666665, ans=0.025 2023-11-29 08:23:35,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3885986.6666666665, ans=0.125 2023-11-29 08:23:55,169 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-11-29 08:24:23,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3886253.3333333335, ans=0.125 2023-11-29 08:24:24,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3886253.3333333335, ans=0.09899494936611666 2023-11-29 08:24:32,450 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 5800, loss[loss=0.06434, simple_loss=0.09368, pruned_loss=0.0115, audio_tagging_loss=0.006004, over 14909.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08909, pruned_loss=0.01194, audio_tagging_loss=0.008554, over 3050495.66 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:24:32,650 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 582950 2023-11-29 08:24:48,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3886386.6666666665, ans=0.1 2023-11-29 08:25:13,364 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.598e+01 9.182e+01 9.577e+01 1.050e+02 1.266e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-29 08:25:15,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.54 vs. limit=22.5 2023-11-29 08:25:33,493 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 5850, loss[loss=0.06395, simple_loss=0.09551, pruned_loss=0.009669, audio_tagging_loss=0.006525, over 16102.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.0887, pruned_loss=0.01191, audio_tagging_loss=0.00852, over 3046983.42 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:25:33,567 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 583000 2023-11-29 08:25:39,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3886653.3333333335, ans=0.125 2023-11-29 08:25:45,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3886720.0, ans=0.125 2023-11-29 08:26:16,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3886853.3333333335, ans=0.1 2023-11-29 08:26:25,203 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=22.5 2023-11-29 08:26:28,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3886920.0, ans=0.0 2023-11-29 08:26:36,845 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 5900, loss[loss=0.05786, simple_loss=0.08238, pruned_loss=0.007415, audio_tagging_loss=0.009258, over 15899.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08842, pruned_loss=0.01185, audio_tagging_loss=0.008479, over 3049851.75 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:26:37,029 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 583050 2023-11-29 08:27:17,947 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.091e+01 9.266e+01 9.950e+01 1.077e+02 1.290e+02, threshold=1.990e+02, percent-clipped=0.0 2023-11-29 08:27:32,274 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.69 vs. limit=15.0 2023-11-29 08:27:35,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3887253.3333333335, ans=0.125 2023-11-29 08:27:38,608 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 5950, loss[loss=0.07873, simple_loss=0.1194, pruned_loss=0.0156, audio_tagging_loss=0.003406, over 15154.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08873, pruned_loss=0.01199, audio_tagging_loss=0.008397, over 3046015.60 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:27:38,709 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 583100 2023-11-29 08:27:41,568 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2023-11-29 08:27:42,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3887320.0, ans=0.1 2023-11-29 08:27:42,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3887320.0, ans=0.1 2023-11-29 08:28:04,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3887453.3333333335, ans=15.0 2023-11-29 08:28:12,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3887453.3333333335, ans=0.125 2023-11-29 08:28:23,968 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.09 vs. limit=15.0 2023-11-29 08:28:40,685 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 6000, loss[loss=0.05349, simple_loss=0.06463, pruned_loss=0.01096, audio_tagging_loss=0.01022, over 14378.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08889, pruned_loss=0.01205, audio_tagging_loss=0.008399, over 3045801.31 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:28:40,687 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-29 08:29:20,069 INFO [train_asr.py:1267] (0/4) Epoch 49, validation: loss=0.05758, simple_loss=0.05041, pruned_loss=0.005303, audio_tagging_loss=0.02707, over 4681554.00 frames. 2023-11-29 08:29:20,070 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-29 08:29:20,188 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 583150 2023-11-29 08:29:30,780 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-29 08:29:31,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3887720.0, ans=0.125 2023-11-29 08:29:59,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3887853.3333333335, ans=0.125 2023-11-29 08:30:00,227 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.59 vs. limit=15.0 2023-11-29 08:30:00,785 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.749e+01 8.890e+01 9.631e+01 1.036e+02 1.251e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 08:30:05,513 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 08:30:18,900 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=22.5 2023-11-29 08:30:22,508 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 6050, loss[loss=0.0618, simple_loss=0.08398, pruned_loss=0.01084, audio_tagging_loss=0.008964, over 14871.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08854, pruned_loss=0.01202, audio_tagging_loss=0.008389, over 3043738.34 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:30:22,626 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 583200 2023-11-29 08:30:22,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3887986.6666666665, ans=0.125 2023-11-29 08:30:24,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3887986.6666666665, ans=0.07 2023-11-29 08:31:01,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3888186.6666666665, ans=0.1 2023-11-29 08:31:24,404 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 6100, loss[loss=0.05751, simple_loss=0.07974, pruned_loss=0.007285, audio_tagging_loss=0.01036, over 16354.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.0888, pruned_loss=0.01206, audio_tagging_loss=0.008511, over 3045502.88 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:31:24,500 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 583250 2023-11-29 08:32:00,583 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.83 vs. limit=15.0 2023-11-29 08:32:05,556 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 9.225e+01 1.004e+02 1.045e+02 1.351e+02, threshold=2.008e+02, percent-clipped=0.0 2023-11-29 08:32:08,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3888520.0, ans=0.125 2023-11-29 08:32:17,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3888586.6666666665, ans=0.5 2023-11-29 08:32:25,416 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 6150, loss[loss=0.076, simple_loss=0.1044, pruned_loss=0.01451, audio_tagging_loss=0.009291, over 15105.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08808, pruned_loss=0.01197, audio_tagging_loss=0.008655, over 3039360.55 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:32:25,537 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 583300 2023-11-29 08:32:25,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3888653.3333333335, ans=0.125 2023-11-29 08:32:28,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3888653.3333333335, ans=0.0 2023-11-29 08:32:32,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3888653.3333333335, ans=0.125 2023-11-29 08:32:36,116 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=15.0 2023-11-29 08:33:26,862 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 6200, loss[loss=0.05053, simple_loss=0.06564, pruned_loss=0.007562, audio_tagging_loss=0.01014, over 14169.00 frames. ], tot_loss[loss=0.06426, simple_loss=0.08745, pruned_loss=0.01186, audio_tagging_loss=0.008672, over 3040400.69 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:33:26,971 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 583350 2023-11-29 08:33:34,382 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.90 vs. limit=15.0 2023-11-29 08:33:42,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3889053.3333333335, ans=0.0 2023-11-29 08:34:08,798 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.749e+01 9.108e+01 9.848e+01 1.055e+02 1.947e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 08:34:10,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3889186.6666666665, ans=0.125 2023-11-29 08:34:29,520 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 6250, loss[loss=0.06049, simple_loss=0.08675, pruned_loss=0.007909, audio_tagging_loss=0.009204, over 13979.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08818, pruned_loss=0.01191, audio_tagging_loss=0.008678, over 3032063.01 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:34:29,610 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 583400 2023-11-29 08:34:39,891 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.19 vs. limit=15.0 2023-11-29 08:34:51,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3889386.6666666665, ans=0.1 2023-11-29 08:34:53,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3889453.3333333335, ans=0.0 2023-11-29 08:35:06,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3889520.0, ans=0.1 2023-11-29 08:35:08,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3889520.0, ans=0.0 2023-11-29 08:35:24,666 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:35:30,157 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 6300, loss[loss=0.06464, simple_loss=0.08472, pruned_loss=0.01232, audio_tagging_loss=0.009956, over 14947.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08859, pruned_loss=0.01194, audio_tagging_loss=0.008681, over 3040570.00 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:35:30,312 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 583450 2023-11-29 08:36:13,133 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.769e+01 9.155e+01 9.755e+01 1.058e+02 1.266e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-29 08:36:29,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3889920.0, ans=0.125 2023-11-29 08:36:31,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3889986.6666666665, ans=0.125 2023-11-29 08:36:32,757 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 6350, loss[loss=0.06687, simple_loss=0.0802, pruned_loss=0.01526, audio_tagging_loss=0.0115, over 15014.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08866, pruned_loss=0.01196, audio_tagging_loss=0.00873, over 3036222.39 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:36:32,845 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 583500 2023-11-29 08:36:45,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3890053.3333333335, ans=0.125 2023-11-29 08:36:56,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3890053.3333333335, ans=0.1 2023-11-29 08:37:23,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3890253.3333333335, ans=0.125 2023-11-29 08:37:28,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3890253.3333333335, ans=0.125 2023-11-29 08:37:36,012 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 6400, loss[loss=0.06626, simple_loss=0.08434, pruned_loss=0.01259, audio_tagging_loss=0.01149, over 14578.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08871, pruned_loss=0.01204, audio_tagging_loss=0.008779, over 3029041.29 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:37:36,115 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 583550 2023-11-29 08:38:16,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3890520.0, ans=0.09899494936611666 2023-11-29 08:38:16,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3890520.0, ans=0.125 2023-11-29 08:38:17,386 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.981e+01 9.586e+01 1.023e+02 1.257e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 08:38:18,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3890520.0, ans=0.125 2023-11-29 08:38:33,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3890586.6666666665, ans=0.2 2023-11-29 08:38:36,786 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 6450, loss[loss=0.05807, simple_loss=0.07228, pruned_loss=0.01435, audio_tagging_loss=0.007581, over 15745.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.0878, pruned_loss=0.01188, audio_tagging_loss=0.008805, over 3020194.87 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:38:36,895 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 583600 2023-11-29 08:38:55,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3890720.0, ans=0.5 2023-11-29 08:39:03,885 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:39:16,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3890853.3333333335, ans=0.0 2023-11-29 08:39:22,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3890853.3333333335, ans=0.125 2023-11-29 08:39:36,203 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.86 vs. limit=10.0 2023-11-29 08:39:39,039 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 6500, loss[loss=0.06404, simple_loss=0.0847, pruned_loss=0.01161, audio_tagging_loss=0.01008, over 14524.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.0891, pruned_loss=0.01204, audio_tagging_loss=0.008781, over 3028795.25 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:39:39,129 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 583650 2023-11-29 08:39:46,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3890986.6666666665, ans=0.0 2023-11-29 08:40:00,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3891053.3333333335, ans=0.025 2023-11-29 08:40:07,053 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.97 vs. limit=15.0 2023-11-29 08:40:22,313 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.191e+01 9.311e+01 1.000e+02 1.077e+02 1.349e+02, threshold=2.001e+02, percent-clipped=0.0 2023-11-29 08:40:39,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3891253.3333333335, ans=0.125 2023-11-29 08:40:41,267 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 6550, loss[loss=0.07031, simple_loss=0.1036, pruned_loss=0.01121, audio_tagging_loss=0.007289, over 15356.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09017, pruned_loss=0.01222, audio_tagging_loss=0.008621, over 3036207.17 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:40:41,410 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 583700 2023-11-29 08:40:48,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3891320.0, ans=0.125 2023-11-29 08:40:58,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3891386.6666666665, ans=0.2 2023-11-29 08:41:11,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3891453.3333333335, ans=0.125 2023-11-29 08:41:18,583 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.09 vs. limit=10.0 2023-11-29 08:41:19,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3891520.0, ans=0.1 2023-11-29 08:41:43,062 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 6600, loss[loss=0.04672, simple_loss=0.06251, pruned_loss=0.008848, audio_tagging_loss=0.006619, over 15174.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08994, pruned_loss=0.01237, audio_tagging_loss=0.008558, over 3033820.83 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:41:43,181 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 583750 2023-11-29 08:41:49,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3891653.3333333335, ans=0.2 2023-11-29 08:41:59,761 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-11-29 08:42:04,721 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.59 vs. limit=12.0 2023-11-29 08:42:12,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3891786.6666666665, ans=0.0 2023-11-29 08:42:25,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3891853.3333333335, ans=0.1 2023-11-29 08:42:26,978 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 9.392e+01 1.005e+02 1.057e+02 1.408e+02, threshold=2.010e+02, percent-clipped=0.0 2023-11-29 08:42:45,111 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 6650, loss[loss=0.05847, simple_loss=0.08055, pruned_loss=0.009735, audio_tagging_loss=0.008464, over 15271.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08978, pruned_loss=0.01223, audio_tagging_loss=0.008473, over 3034647.11 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:42:45,229 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 583800 2023-11-29 08:42:48,506 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.93 vs. limit=22.5 2023-11-29 08:43:05,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3892053.3333333335, ans=0.125 2023-11-29 08:43:20,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3892120.0, ans=0.1 2023-11-29 08:43:37,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3892253.3333333335, ans=0.0 2023-11-29 08:43:40,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3892253.3333333335, ans=0.1 2023-11-29 08:43:48,132 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 6700, loss[loss=0.06346, simple_loss=0.09029, pruned_loss=0.009045, audio_tagging_loss=0.009276, over 15195.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.09004, pruned_loss=0.01217, audio_tagging_loss=0.008462, over 3036224.44 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:43:48,217 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 583850 2023-11-29 08:44:10,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3892386.6666666665, ans=0.0 2023-11-29 08:44:18,266 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.63 vs. limit=22.5 2023-11-29 08:44:26,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3892520.0, ans=0.125 2023-11-29 08:44:31,117 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 9.029e+01 9.606e+01 1.021e+02 1.283e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-29 08:44:39,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3892586.6666666665, ans=0.1 2023-11-29 08:44:49,283 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 6750, loss[loss=0.05298, simple_loss=0.06649, pruned_loss=0.009653, audio_tagging_loss=0.01008, over 13691.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08956, pruned_loss=0.01203, audio_tagging_loss=0.008447, over 3035227.21 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:44:49,406 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 583900 2023-11-29 08:44:56,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3892653.3333333335, ans=0.1 2023-11-29 08:45:08,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.65 vs. limit=15.0 2023-11-29 08:45:29,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.56 vs. limit=10.0 2023-11-29 08:45:50,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3892986.6666666665, ans=0.2 2023-11-29 08:45:51,293 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 6800, loss[loss=0.06601, simple_loss=0.0907, pruned_loss=0.01255, audio_tagging_loss=0.008112, over 15593.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.0892, pruned_loss=0.01196, audio_tagging_loss=0.00847, over 3036421.99 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:45:51,423 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 583950 2023-11-29 08:46:11,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3893053.3333333335, ans=0.125 2023-11-29 08:46:13,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.46 vs. limit=15.0 2023-11-29 08:46:34,383 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 9.158e+01 9.734e+01 1.076e+02 1.387e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-29 08:46:34,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3893186.6666666665, ans=0.1 2023-11-29 08:46:53,216 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 6850, loss[loss=0.05654, simple_loss=0.06755, pruned_loss=0.01487, audio_tagging_loss=0.007888, over 14620.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08978, pruned_loss=0.01218, audio_tagging_loss=0.008501, over 3038886.44 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:46:53,322 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 584000 2023-11-29 08:46:54,841 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-584000.pt 2023-11-29 08:47:02,136 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:47:07,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3893386.6666666665, ans=0.0 2023-11-29 08:47:09,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3893386.6666666665, ans=0.0 2023-11-29 08:47:12,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3893386.6666666665, ans=0.125 2023-11-29 08:47:15,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3893386.6666666665, ans=0.2 2023-11-29 08:47:15,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.12 vs. limit=15.0 2023-11-29 08:47:22,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3893453.3333333335, ans=0.0 2023-11-29 08:47:25,997 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:47:33,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3893520.0, ans=0.09899494936611666 2023-11-29 08:47:34,331 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.12 vs. limit=10.0 2023-11-29 08:47:46,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3893586.6666666665, ans=0.0 2023-11-29 08:47:52,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3893586.6666666665, ans=0.125 2023-11-29 08:47:54,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3893586.6666666665, ans=0.125 2023-11-29 08:47:56,378 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 6900, loss[loss=0.05556, simple_loss=0.07769, pruned_loss=0.009978, audio_tagging_loss=0.006738, over 15494.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08933, pruned_loss=0.01182, audio_tagging_loss=0.008448, over 3042508.64 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:47:56,488 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 584050 2023-11-29 08:48:07,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3893720.0, ans=0.0 2023-11-29 08:48:09,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3893720.0, ans=0.125 2023-11-29 08:48:39,322 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.820e+01 9.126e+01 9.494e+01 1.002e+02 1.227e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-29 08:48:42,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3893853.3333333335, ans=0.1 2023-11-29 08:48:45,174 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 08:48:58,290 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 6950, loss[loss=0.07079, simple_loss=0.09953, pruned_loss=0.0162, audio_tagging_loss=0.004826, over 15409.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08993, pruned_loss=0.0118, audio_tagging_loss=0.008474, over 3039345.47 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:48:58,417 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 584100 2023-11-29 08:48:58,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3893986.6666666665, ans=0.125 2023-11-29 08:49:01,190 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2023-11-29 08:49:02,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3893986.6666666665, ans=0.07 2023-11-29 08:49:19,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3894053.3333333335, ans=0.07 2023-11-29 08:49:19,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3894053.3333333335, ans=0.125 2023-11-29 08:49:23,437 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.54 vs. limit=22.5 2023-11-29 08:49:26,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3894120.0, ans=0.125 2023-11-29 08:49:28,110 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.22 vs. limit=22.5 2023-11-29 08:49:31,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3894120.0, ans=0.1 2023-11-29 08:49:41,650 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2023-11-29 08:49:51,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3894253.3333333335, ans=0.0 2023-11-29 08:49:55,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3894253.3333333335, ans=0.0 2023-11-29 08:49:58,688 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 7000, loss[loss=0.08061, simple_loss=0.1124, pruned_loss=0.01687, audio_tagging_loss=0.007562, over 15590.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08945, pruned_loss=0.01163, audio_tagging_loss=0.008581, over 3047186.67 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:49:58,793 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 584150 2023-11-29 08:50:12,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3894386.6666666665, ans=0.07 2023-11-29 08:50:23,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3894453.3333333335, ans=0.0 2023-11-29 08:50:25,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3894453.3333333335, ans=0.025 2023-11-29 08:50:33,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3894453.3333333335, ans=0.0 2023-11-29 08:50:37,228 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.84 vs. limit=6.0 2023-11-29 08:50:40,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3894520.0, ans=0.2 2023-11-29 08:50:42,683 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.892e+01 9.505e+01 1.031e+02 1.228e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-29 08:50:46,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3894520.0, ans=0.1 2023-11-29 08:51:01,123 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 7050, loss[loss=0.06205, simple_loss=0.08963, pruned_loss=0.008606, audio_tagging_loss=0.008626, over 16178.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08878, pruned_loss=0.01143, audio_tagging_loss=0.008612, over 3048675.23 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:51:01,279 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 584200 2023-11-29 08:51:01,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3894653.3333333335, ans=0.1 2023-11-29 08:51:10,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3894653.3333333335, ans=0.1 2023-11-29 08:51:11,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3894653.3333333335, ans=0.0 2023-11-29 08:51:14,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3894720.0, ans=0.125 2023-11-29 08:51:18,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3894720.0, ans=0.0 2023-11-29 08:51:23,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3894720.0, ans=0.0 2023-11-29 08:51:44,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3894853.3333333335, ans=0.0 2023-11-29 08:51:50,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3894920.0, ans=0.1 2023-11-29 08:51:53,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3894920.0, ans=0.0 2023-11-29 08:51:56,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3894920.0, ans=0.1 2023-11-29 08:52:01,497 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 7100, loss[loss=0.07366, simple_loss=0.1062, pruned_loss=0.01347, audio_tagging_loss=0.007114, over 15403.00 frames. ], tot_loss[loss=0.06432, simple_loss=0.08865, pruned_loss=0.01139, audio_tagging_loss=0.008608, over 3048770.78 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 08:52:01,625 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 584250 2023-11-29 08:52:31,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3895120.0, ans=0.0 2023-11-29 08:52:43,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3895186.6666666665, ans=0.125 2023-11-29 08:52:43,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3895186.6666666665, ans=0.05 2023-11-29 08:52:45,228 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.440e+01 9.114e+01 9.630e+01 1.033e+02 1.554e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 08:52:48,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3895186.6666666665, ans=0.125 2023-11-29 08:52:58,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3895253.3333333335, ans=0.0 2023-11-29 08:53:03,124 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 7150, loss[loss=0.07695, simple_loss=0.1136, pruned_loss=0.01288, audio_tagging_loss=0.007253, over 15690.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.0889, pruned_loss=0.01136, audio_tagging_loss=0.008623, over 3053986.69 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 08:53:03,247 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 584300 2023-11-29 08:53:13,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3895320.0, ans=0.1 2023-11-29 08:53:15,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3895386.6666666665, ans=0.0 2023-11-29 08:53:17,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3895386.6666666665, ans=0.125 2023-11-29 08:53:25,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3895386.6666666665, ans=0.125 2023-11-29 08:53:34,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3895453.3333333335, ans=0.2 2023-11-29 08:53:45,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3895520.0, ans=0.0 2023-11-29 08:53:52,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3895586.6666666665, ans=0.125 2023-11-29 08:54:04,778 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 7200, loss[loss=0.07943, simple_loss=0.1047, pruned_loss=0.01821, audio_tagging_loss=0.008851, over 14984.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08946, pruned_loss=0.01158, audio_tagging_loss=0.008634, over 3047298.75 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:54:04,908 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 584350 2023-11-29 08:54:24,719 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=12.0 2023-11-29 08:54:48,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3895853.3333333335, ans=0.125 2023-11-29 08:54:48,868 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.750e+01 8.925e+01 9.835e+01 1.040e+02 1.813e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-29 08:55:05,309 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 7250, loss[loss=0.069, simple_loss=0.09535, pruned_loss=0.01373, audio_tagging_loss=0.007588, over 15429.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.0885, pruned_loss=0.01155, audio_tagging_loss=0.008747, over 3044747.56 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:55:05,416 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 584400 2023-11-29 08:55:12,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3895986.6666666665, ans=0.0 2023-11-29 08:55:19,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3896053.3333333335, ans=0.1 2023-11-29 08:55:22,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3896053.3333333335, ans=0.1 2023-11-29 08:55:31,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3896120.0, ans=0.125 2023-11-29 08:55:42,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3896186.6666666665, ans=0.125 2023-11-29 08:55:59,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3896253.3333333335, ans=0.125 2023-11-29 08:56:02,044 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=15.0 2023-11-29 08:56:07,916 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 7300, loss[loss=0.06323, simple_loss=0.08597, pruned_loss=0.01037, audio_tagging_loss=0.009879, over 15702.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08806, pruned_loss=0.01168, audio_tagging_loss=0.008743, over 3038321.36 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:56:08,052 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 584450 2023-11-29 08:56:08,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3896320.0, ans=0.0 2023-11-29 08:56:09,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3896320.0, ans=0.125 2023-11-29 08:56:16,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3896320.0, ans=0.2 2023-11-29 08:56:21,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3896386.6666666665, ans=0.125 2023-11-29 08:56:51,122 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.824e+01 9.158e+01 9.688e+01 1.038e+02 1.242e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 08:56:56,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3896586.6666666665, ans=0.125 2023-11-29 08:57:00,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3896586.6666666665, ans=0.1 2023-11-29 08:57:02,615 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2023-11-29 08:57:08,980 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 7350, loss[loss=0.0658, simple_loss=0.09576, pruned_loss=0.0102, audio_tagging_loss=0.007728, over 16649.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.088, pruned_loss=0.01156, audio_tagging_loss=0.008553, over 3036659.64 frames. ], batch size: 62, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:57:09,107 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 584500 2023-11-29 08:57:23,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3896720.0, ans=0.04949747468305833 2023-11-29 08:57:31,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3896720.0, ans=0.0 2023-11-29 08:57:35,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3896786.6666666665, ans=0.125 2023-11-29 08:57:50,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3896853.3333333335, ans=0.125 2023-11-29 08:57:57,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3896920.0, ans=0.125 2023-11-29 08:58:09,749 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 7400, loss[loss=0.08099, simple_loss=0.1104, pruned_loss=0.01888, audio_tagging_loss=0.006915, over 15374.00 frames. ], tot_loss[loss=0.06425, simple_loss=0.08833, pruned_loss=0.01165, audio_tagging_loss=0.008438, over 3043207.75 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:58:09,845 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 584550 2023-11-29 08:58:23,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3897053.3333333335, ans=0.125 2023-11-29 08:58:46,982 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:58:47,232 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.06 vs. limit=22.5 2023-11-29 08:58:48,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3897186.6666666665, ans=0.1 2023-11-29 08:58:54,835 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.663e+01 9.320e+01 9.934e+01 1.095e+02 1.320e+02, threshold=1.987e+02, percent-clipped=0.0 2023-11-29 08:58:55,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3897186.6666666665, ans=0.125 2023-11-29 08:59:10,437 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 7450, loss[loss=0.05022, simple_loss=0.06802, pruned_loss=0.007991, audio_tagging_loss=0.008217, over 17496.00 frames. ], tot_loss[loss=0.06406, simple_loss=0.08804, pruned_loss=0.01164, audio_tagging_loss=0.008394, over 3046116.17 frames. ], batch size: 66, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 08:59:10,536 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 584600 2023-11-29 08:59:26,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3897386.6666666665, ans=0.125 2023-11-29 08:59:26,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3897386.6666666665, ans=0.0 2023-11-29 08:59:33,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3897386.6666666665, ans=0.2 2023-11-29 08:59:36,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3897453.3333333335, ans=0.125 2023-11-29 08:59:38,135 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.58 vs. limit=22.5 2023-11-29 08:59:53,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3897520.0, ans=0.2 2023-11-29 08:59:53,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3897520.0, ans=0.2 2023-11-29 09:00:11,654 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 7500, loss[loss=0.07864, simple_loss=0.1103, pruned_loss=0.01707, audio_tagging_loss=0.006422, over 14712.00 frames. ], tot_loss[loss=0.06431, simple_loss=0.08843, pruned_loss=0.01171, audio_tagging_loss=0.008386, over 3045742.50 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:00:11,803 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 584650 2023-11-29 09:00:29,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3897720.0, ans=0.125 2023-11-29 09:00:44,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3897786.6666666665, ans=0.1 2023-11-29 09:00:49,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3897853.3333333335, ans=0.2 2023-11-29 09:00:49,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3897853.3333333335, ans=0.125 2023-11-29 09:00:55,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3897853.3333333335, ans=0.125 2023-11-29 09:00:57,155 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.707e+01 9.073e+01 9.674e+01 1.060e+02 1.310e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-29 09:01:12,424 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 7550, loss[loss=0.07184, simple_loss=0.09882, pruned_loss=0.01597, audio_tagging_loss=0.006464, over 14728.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08865, pruned_loss=0.0118, audio_tagging_loss=0.008424, over 3045717.49 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:01:12,568 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 584700 2023-11-29 09:01:14,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3897986.6666666665, ans=0.0 2023-11-29 09:01:52,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3898186.6666666665, ans=0.2 2023-11-29 09:01:54,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3898186.6666666665, ans=0.1 2023-11-29 09:02:13,866 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 7600, loss[loss=0.05127, simple_loss=0.07057, pruned_loss=0.007838, audio_tagging_loss=0.008146, over 14542.00 frames. ], tot_loss[loss=0.06385, simple_loss=0.08733, pruned_loss=0.01166, audio_tagging_loss=0.008526, over 3041921.38 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:02:13,966 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 584750 2023-11-29 09:02:39,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3898453.3333333335, ans=0.125 2023-11-29 09:02:41,285 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.38 vs. limit=12.0 2023-11-29 09:02:52,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=3898520.0, ans=0.2 2023-11-29 09:02:59,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3898520.0, ans=0.0 2023-11-29 09:02:59,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3898520.0, ans=0.2 2023-11-29 09:03:00,047 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.761e+01 9.018e+01 9.691e+01 1.036e+02 1.365e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 09:03:00,688 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.25 vs. limit=10.0 2023-11-29 09:03:01,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3898520.0, ans=0.2 2023-11-29 09:03:07,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3898586.6666666665, ans=0.125 2023-11-29 09:03:10,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3898586.6666666665, ans=0.125 2023-11-29 09:03:16,490 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 7650, loss[loss=0.0701, simple_loss=0.0977, pruned_loss=0.01362, audio_tagging_loss=0.007627, over 14541.00 frames. ], tot_loss[loss=0.06378, simple_loss=0.08715, pruned_loss=0.01175, audio_tagging_loss=0.008454, over 3037914.64 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:03:16,602 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 584800 2023-11-29 09:03:21,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3898653.3333333335, ans=0.09899494936611666 2023-11-29 09:03:23,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3898653.3333333335, ans=0.125 2023-11-29 09:03:26,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3898653.3333333335, ans=0.125 2023-11-29 09:03:29,915 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.25 vs. limit=22.5 2023-11-29 09:03:31,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3898720.0, ans=0.125 2023-11-29 09:03:55,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3898853.3333333335, ans=0.125 2023-11-29 09:04:08,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3898920.0, ans=0.0 2023-11-29 09:04:16,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3898920.0, ans=0.0 2023-11-29 09:04:18,682 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 7700, loss[loss=0.07335, simple_loss=0.1115, pruned_loss=0.01201, audio_tagging_loss=0.005582, over 16094.00 frames. ], tot_loss[loss=0.06358, simple_loss=0.08715, pruned_loss=0.01159, audio_tagging_loss=0.008414, over 3042678.69 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:04:18,803 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 584850 2023-11-29 09:04:23,543 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:04:24,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3898986.6666666665, ans=0.125 2023-11-29 09:04:34,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3899053.3333333335, ans=0.125 2023-11-29 09:05:04,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3899186.6666666665, ans=0.125 2023-11-29 09:05:05,205 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.265e+01 9.378e+01 9.780e+01 1.035e+02 1.508e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-29 09:05:19,937 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 7750, loss[loss=0.07652, simple_loss=0.1131, pruned_loss=0.01299, audio_tagging_loss=0.006993, over 15769.00 frames. ], tot_loss[loss=0.06372, simple_loss=0.08738, pruned_loss=0.01152, audio_tagging_loss=0.00851, over 3045773.96 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:05:20,029 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 584900 2023-11-29 09:05:48,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3899453.3333333335, ans=0.125 2023-11-29 09:05:59,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3899520.0, ans=0.125 2023-11-29 09:06:10,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3899586.6666666665, ans=10.0 2023-11-29 09:06:21,800 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 7800, loss[loss=0.06306, simple_loss=0.07888, pruned_loss=0.01385, audio_tagging_loss=0.009763, over 15152.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.08802, pruned_loss=0.01165, audio_tagging_loss=0.008461, over 3040975.98 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:06:21,932 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 584950 2023-11-29 09:06:23,702 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-11-29 09:07:08,634 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 9.073e+01 9.693e+01 1.045e+02 1.224e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-29 09:07:15,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3899920.0, ans=0.125 2023-11-29 09:07:23,377 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 7850, loss[loss=0.07582, simple_loss=0.1126, pruned_loss=0.01347, audio_tagging_loss=0.006035, over 15694.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08886, pruned_loss=0.01168, audio_tagging_loss=0.008479, over 3047235.80 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:07:23,474 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 585000 2023-11-29 09:07:31,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3899986.6666666665, ans=0.125 2023-11-29 09:07:44,819 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.72 vs. limit=15.0 2023-11-29 09:07:58,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3900186.6666666665, ans=0.125 2023-11-29 09:08:01,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3900186.6666666665, ans=0.125 2023-11-29 09:08:01,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3900186.6666666665, ans=0.0 2023-11-29 09:08:09,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3900186.6666666665, ans=0.125 2023-11-29 09:08:12,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3900253.3333333335, ans=0.025 2023-11-29 09:08:24,384 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 7900, loss[loss=0.06891, simple_loss=0.09364, pruned_loss=0.01359, audio_tagging_loss=0.008496, over 15906.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08904, pruned_loss=0.01176, audio_tagging_loss=0.008597, over 3049959.13 frames. ], batch size: 62, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:08:24,503 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 585050 2023-11-29 09:08:52,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3900453.3333333335, ans=0.0 2023-11-29 09:08:58,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3900453.3333333335, ans=0.1 2023-11-29 09:09:10,032 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.168e+01 9.323e+01 9.871e+01 1.069e+02 1.326e+02, threshold=1.974e+02, percent-clipped=0.0 2023-11-29 09:09:12,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3900586.6666666665, ans=0.0 2023-11-29 09:09:17,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3900586.6666666665, ans=0.0 2023-11-29 09:09:21,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3900586.6666666665, ans=0.0 2023-11-29 09:09:23,940 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 7950, loss[loss=0.0651, simple_loss=0.08111, pruned_loss=0.01194, audio_tagging_loss=0.0126, over 15408.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08943, pruned_loss=0.0119, audio_tagging_loss=0.008704, over 3053818.86 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:09:24,064 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 585100 2023-11-29 09:09:40,444 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:09:45,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3900720.0, ans=0.125 2023-11-29 09:09:50,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3900786.6666666665, ans=0.0 2023-11-29 09:10:10,824 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.59 vs. limit=10.0 2023-11-29 09:10:17,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3900920.0, ans=0.1 2023-11-29 09:10:24,888 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 8000, loss[loss=0.0761, simple_loss=0.1024, pruned_loss=0.01403, audio_tagging_loss=0.01086, over 14662.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08843, pruned_loss=0.01185, audio_tagging_loss=0.008752, over 3049445.15 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:10:25,032 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 585150 2023-11-29 09:10:44,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3901053.3333333335, ans=0.1 2023-11-29 09:11:03,579 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.07 vs. limit=10.0 2023-11-29 09:11:11,212 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 9.013e+01 9.705e+01 1.030e+02 1.171e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-29 09:11:13,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3901253.3333333335, ans=0.125 2023-11-29 09:11:25,767 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 8050, loss[loss=0.08806, simple_loss=0.117, pruned_loss=0.02268, audio_tagging_loss=0.006858, over 14284.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08855, pruned_loss=0.0118, audio_tagging_loss=0.008753, over 3042056.34 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:11:25,892 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 585200 2023-11-29 09:11:38,091 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.22 vs. limit=22.5 2023-11-29 09:12:07,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3901520.0, ans=0.125 2023-11-29 09:12:13,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.33 vs. limit=15.0 2023-11-29 09:12:28,029 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 8100, loss[loss=0.07188, simple_loss=0.09307, pruned_loss=0.0159, audio_tagging_loss=0.009441, over 15745.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08929, pruned_loss=0.01175, audio_tagging_loss=0.008664, over 3045089.50 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:12:28,171 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 585250 2023-11-29 09:12:28,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3901653.3333333335, ans=0.07 2023-11-29 09:12:40,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3901720.0, ans=0.125 2023-11-29 09:12:40,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3901720.0, ans=0.125 2023-11-29 09:12:50,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3901720.0, ans=0.125 2023-11-29 09:12:52,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3901786.6666666665, ans=0.0 2023-11-29 09:13:00,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3901786.6666666665, ans=0.0 2023-11-29 09:13:13,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3901853.3333333335, ans=0.0 2023-11-29 09:13:16,474 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 9.257e+01 9.923e+01 1.057e+02 1.359e+02, threshold=1.985e+02, percent-clipped=0.0 2023-11-29 09:13:16,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3901920.0, ans=0.1 2023-11-29 09:13:29,958 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 8150, loss[loss=0.08213, simple_loss=0.1077, pruned_loss=0.01819, audio_tagging_loss=0.01009, over 14798.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08967, pruned_loss=0.01178, audio_tagging_loss=0.008538, over 3048149.79 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:13:30,088 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 585300 2023-11-29 09:13:47,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3902053.3333333335, ans=0.125 2023-11-29 09:14:21,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3902253.3333333335, ans=0.0 2023-11-29 09:14:28,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3902253.3333333335, ans=0.95 2023-11-29 09:14:31,339 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 8200, loss[loss=0.07152, simple_loss=0.1044, pruned_loss=0.009704, audio_tagging_loss=0.009635, over 15201.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08938, pruned_loss=0.01154, audio_tagging_loss=0.008415, over 3051571.10 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:14:31,424 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 585350 2023-11-29 09:14:33,654 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:14:43,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3902386.6666666665, ans=0.125 2023-11-29 09:14:45,093 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.23 vs. limit=15.0 2023-11-29 09:14:47,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3902386.6666666665, ans=10.0 2023-11-29 09:14:55,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3902453.3333333335, ans=0.125 2023-11-29 09:15:01,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3902453.3333333335, ans=0.125 2023-11-29 09:15:18,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3902520.0, ans=0.125 2023-11-29 09:15:19,730 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.775e+01 9.273e+01 9.882e+01 1.047e+02 1.240e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-29 09:15:34,293 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 8250, loss[loss=0.05665, simple_loss=0.07577, pruned_loss=0.00878, audio_tagging_loss=0.00999, over 16211.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08941, pruned_loss=0.01145, audio_tagging_loss=0.008358, over 3058048.31 frames. ], batch size: 61, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:15:34,437 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 585400 2023-11-29 09:15:37,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3902653.3333333335, ans=0.0 2023-11-29 09:15:39,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3902653.3333333335, ans=0.2 2023-11-29 09:15:50,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3902720.0, ans=0.2 2023-11-29 09:15:51,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3902720.0, ans=0.125 2023-11-29 09:16:18,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3902853.3333333335, ans=0.0 2023-11-29 09:16:29,588 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-11-29 09:16:31,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3902920.0, ans=0.0 2023-11-29 09:16:34,932 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2023-11-29 09:16:36,681 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 8300, loss[loss=0.06844, simple_loss=0.08741, pruned_loss=0.01734, audio_tagging_loss=0.007391, over 13549.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08974, pruned_loss=0.01159, audio_tagging_loss=0.008364, over 3053577.96 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:16:36,795 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 585450 2023-11-29 09:16:49,088 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.29 vs. limit=22.5 2023-11-29 09:16:57,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3903053.3333333335, ans=0.0 2023-11-29 09:17:05,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3903120.0, ans=0.0 2023-11-29 09:17:06,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3903120.0, ans=0.0 2023-11-29 09:17:20,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3903186.6666666665, ans=0.0 2023-11-29 09:17:24,340 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 8.975e+01 9.727e+01 1.046e+02 1.425e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-29 09:17:37,226 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 8350, loss[loss=0.06886, simple_loss=0.09805, pruned_loss=0.01509, audio_tagging_loss=0.004743, over 14412.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08889, pruned_loss=0.01163, audio_tagging_loss=0.008378, over 3048124.28 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:17:37,335 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 585500 2023-11-29 09:17:46,197 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.38 vs. limit=12.0 2023-11-29 09:18:16,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3903520.0, ans=0.0 2023-11-29 09:18:39,269 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 8400, loss[loss=0.06674, simple_loss=0.09088, pruned_loss=0.01375, audio_tagging_loss=0.007556, over 14661.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.08857, pruned_loss=0.01154, audio_tagging_loss=0.008293, over 3040686.39 frames. ], batch size: 52, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:18:39,377 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 585550 2023-11-29 09:19:10,641 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2023-11-29 09:19:19,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3903853.3333333335, ans=0.1 2023-11-29 09:19:28,560 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 8.927e+01 9.448e+01 1.050e+02 1.277e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-29 09:19:40,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3903986.6666666665, ans=0.0 2023-11-29 09:19:41,555 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 8450, loss[loss=0.06096, simple_loss=0.0725, pruned_loss=0.01651, audio_tagging_loss=0.008203, over 14207.00 frames. ], tot_loss[loss=0.06386, simple_loss=0.08797, pruned_loss=0.01154, audio_tagging_loss=0.008339, over 3038208.77 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:19:41,647 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 585600 2023-11-29 09:19:43,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3903986.6666666665, ans=0.1 2023-11-29 09:19:45,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3903986.6666666665, ans=0.0 2023-11-29 09:19:48,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3903986.6666666665, ans=0.125 2023-11-29 09:19:58,421 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:20:12,051 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.04 vs. limit=15.0 2023-11-29 09:20:27,528 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.35 vs. limit=8.0 2023-11-29 09:20:36,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3904253.3333333335, ans=0.1 2023-11-29 09:20:39,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3904253.3333333335, ans=0.1 2023-11-29 09:20:42,690 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 8500, loss[loss=0.06224, simple_loss=0.0841, pruned_loss=0.009348, audio_tagging_loss=0.01084, over 14596.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08928, pruned_loss=0.01172, audio_tagging_loss=0.008354, over 3044936.26 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:20:42,809 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 585650 2023-11-29 09:20:55,435 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.81 vs. limit=15.0 2023-11-29 09:21:05,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3904386.6666666665, ans=0.125 2023-11-29 09:21:18,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3904453.3333333335, ans=0.125 2023-11-29 09:21:31,786 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.672e+01 9.118e+01 9.879e+01 1.041e+02 1.425e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-29 09:21:33,749 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2023-11-29 09:21:43,529 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.20 vs. limit=15.0 2023-11-29 09:21:44,140 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 8550, loss[loss=0.05289, simple_loss=0.06804, pruned_loss=0.007598, audio_tagging_loss=0.01128, over 14626.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.09022, pruned_loss=0.01198, audio_tagging_loss=0.008363, over 3043959.55 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:21:44,337 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 585700 2023-11-29 09:21:50,056 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.85 vs. limit=10.0 2023-11-29 09:21:50,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3904653.3333333335, ans=0.125 2023-11-29 09:22:20,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3904853.3333333335, ans=0.1 2023-11-29 09:22:24,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3904853.3333333335, ans=0.125 2023-11-29 09:22:46,563 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 8600, loss[loss=0.06312, simple_loss=0.08905, pruned_loss=0.01056, audio_tagging_loss=0.008034, over 15340.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.09022, pruned_loss=0.01201, audio_tagging_loss=0.008433, over 3044379.82 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:22:46,681 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 585750 2023-11-29 09:22:54,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3904986.6666666665, ans=0.0 2023-11-29 09:23:05,797 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.51 vs. limit=12.0 2023-11-29 09:23:25,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3905186.6666666665, ans=0.2 2023-11-29 09:23:36,040 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 9.131e+01 9.509e+01 1.045e+02 1.246e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-29 09:23:44,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3905253.3333333335, ans=0.1 2023-11-29 09:23:47,821 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 8650, loss[loss=0.06537, simple_loss=0.09178, pruned_loss=0.009888, audio_tagging_loss=0.00959, over 14387.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.0903, pruned_loss=0.01201, audio_tagging_loss=0.008516, over 3048151.56 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:23:47,933 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 585800 2023-11-29 09:23:49,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3905320.0, ans=0.07 2023-11-29 09:23:51,142 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.87 vs. limit=10.0 2023-11-29 09:23:58,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3905386.6666666665, ans=0.0 2023-11-29 09:24:28,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3905520.0, ans=0.0 2023-11-29 09:24:42,354 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.70 vs. limit=22.5 2023-11-29 09:24:49,340 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 8700, loss[loss=0.09125, simple_loss=0.1359, pruned_loss=0.01796, audio_tagging_loss=0.00535, over 14713.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08963, pruned_loss=0.01195, audio_tagging_loss=0.008628, over 3049801.89 frames. ], batch size: 53, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:24:49,470 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 585850 2023-11-29 09:24:55,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3905653.3333333335, ans=0.125 2023-11-29 09:25:04,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3905720.0, ans=0.5 2023-11-29 09:25:15,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3905786.6666666665, ans=0.0 2023-11-29 09:25:20,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3905786.6666666665, ans=10.0 2023-11-29 09:25:28,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.32 vs. limit=22.5 2023-11-29 09:25:33,449 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2023-11-29 09:25:38,430 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.006e+01 9.124e+01 9.803e+01 1.053e+02 1.295e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-29 09:25:51,278 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 8750, loss[loss=0.0669, simple_loss=0.08685, pruned_loss=0.01501, audio_tagging_loss=0.008464, over 16231.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09039, pruned_loss=0.01226, audio_tagging_loss=0.00869, over 3041400.03 frames. ], batch size: 62, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:25:51,359 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 585900 2023-11-29 09:26:08,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3906053.3333333335, ans=0.125 2023-11-29 09:26:13,255 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.95 vs. limit=10.0 2023-11-29 09:26:21,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3906120.0, ans=0.95 2023-11-29 09:26:25,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2023-11-29 09:26:47,977 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.69 vs. limit=15.0 2023-11-29 09:26:48,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3906253.3333333335, ans=0.1 2023-11-29 09:26:51,799 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 8800, loss[loss=0.09411, simple_loss=0.1344, pruned_loss=0.02115, audio_tagging_loss=0.005759, over 16593.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09123, pruned_loss=0.01232, audio_tagging_loss=0.008672, over 3046445.93 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:26:51,915 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 585950 2023-11-29 09:26:53,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3906320.0, ans=0.125 2023-11-29 09:26:57,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3906320.0, ans=0.04949747468305833 2023-11-29 09:27:09,390 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.53 vs. limit=12.0 2023-11-29 09:27:35,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3906520.0, ans=0.125 2023-11-29 09:27:36,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3906520.0, ans=0.125 2023-11-29 09:27:40,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-11-29 09:27:40,499 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.520e+01 9.118e+01 9.794e+01 1.059e+02 1.251e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-29 09:27:52,777 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 8850, loss[loss=0.05151, simple_loss=0.06703, pruned_loss=0.005155, audio_tagging_loss=0.01284, over 15068.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09119, pruned_loss=0.01229, audio_tagging_loss=0.008683, over 3042268.81 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:27:52,883 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 586000 2023-11-29 09:27:56,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.93 vs. limit=12.0 2023-11-29 09:27:56,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3906653.3333333335, ans=0.0 2023-11-29 09:27:59,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3906653.3333333335, ans=0.0 2023-11-29 09:28:05,813 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:28:06,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2023-11-29 09:28:15,097 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.30 vs. limit=15.0 2023-11-29 09:28:51,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.73 vs. limit=15.0 2023-11-29 09:28:53,399 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 8900, loss[loss=0.06029, simple_loss=0.08526, pruned_loss=0.009815, audio_tagging_loss=0.007844, over 15152.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09068, pruned_loss=0.01217, audio_tagging_loss=0.008576, over 3048054.92 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:28:53,491 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 586050 2023-11-29 09:29:02,838 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=22.5 2023-11-29 09:29:39,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3907186.6666666665, ans=0.125 2023-11-29 09:29:40,250 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2023-11-29 09:29:42,541 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 9.012e+01 9.649e+01 1.060e+02 1.281e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 09:29:55,283 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 8950, loss[loss=0.05728, simple_loss=0.08132, pruned_loss=0.008262, audio_tagging_loss=0.008353, over 14100.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.09039, pruned_loss=0.01211, audio_tagging_loss=0.008461, over 3046416.55 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:29:55,362 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 586100 2023-11-29 09:29:58,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3907320.0, ans=0.0 2023-11-29 09:30:07,572 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.65 vs. limit=22.5 2023-11-29 09:30:19,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3907453.3333333335, ans=0.1 2023-11-29 09:30:29,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3907453.3333333335, ans=0.2 2023-11-29 09:30:36,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3907520.0, ans=0.0 2023-11-29 09:30:42,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3907520.0, ans=0.0 2023-11-29 09:30:56,189 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 9000, loss[loss=0.08992, simple_loss=0.1346, pruned_loss=0.01617, audio_tagging_loss=0.006449, over 15475.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09127, pruned_loss=0.0123, audio_tagging_loss=0.008415, over 3048730.99 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:30:56,191 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-29 09:31:27,037 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1495, 2.4087, 4.9935, 3.0202], device='cuda:0') 2023-11-29 09:31:36,004 INFO [train_asr.py:1267] (0/4) Epoch 49, validation: loss=0.05863, simple_loss=0.05047, pruned_loss=0.00547, audio_tagging_loss=0.02792, over 4681554.00 frames. 2023-11-29 09:31:36,004 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-29 09:31:36,119 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 586150 2023-11-29 09:31:48,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3907720.0, ans=10.0 2023-11-29 09:31:55,326 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.50 vs. limit=15.0 2023-11-29 09:31:57,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3907720.0, ans=0.125 2023-11-29 09:31:59,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3907786.6666666665, ans=0.5 2023-11-29 09:32:04,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3907786.6666666665, ans=0.0 2023-11-29 09:32:15,686 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.39 vs. limit=15.0 2023-11-29 09:32:16,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3907853.3333333335, ans=0.05 2023-11-29 09:32:25,451 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 9.291e+01 1.003e+02 1.087e+02 1.506e+02, threshold=2.006e+02, percent-clipped=0.0 2023-11-29 09:32:37,789 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 9050, loss[loss=0.08621, simple_loss=0.1203, pruned_loss=0.0186, audio_tagging_loss=0.007467, over 16521.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09139, pruned_loss=0.01228, audio_tagging_loss=0.008349, over 3050476.66 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:32:37,912 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 586200 2023-11-29 09:32:38,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3907986.6666666665, ans=0.0 2023-11-29 09:32:39,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3907986.6666666665, ans=0.07 2023-11-29 09:32:57,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3908053.3333333335, ans=0.125 2023-11-29 09:33:06,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3908120.0, ans=0.0 2023-11-29 09:33:39,647 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 9100, loss[loss=0.06623, simple_loss=0.08226, pruned_loss=0.01533, audio_tagging_loss=0.009769, over 15089.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09064, pruned_loss=0.01217, audio_tagging_loss=0.008449, over 3048661.02 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:33:39,769 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 586250 2023-11-29 09:33:54,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3908386.6666666665, ans=0.125 2023-11-29 09:34:04,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3908453.3333333335, ans=0.1 2023-11-29 09:34:18,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3908520.0, ans=0.2 2023-11-29 09:34:30,299 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 9.126e+01 9.745e+01 1.074e+02 1.723e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-29 09:34:41,033 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 9150, loss[loss=0.08001, simple_loss=0.1175, pruned_loss=0.01544, audio_tagging_loss=0.00583, over 15729.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09075, pruned_loss=0.01212, audio_tagging_loss=0.008448, over 3055121.52 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:34:41,170 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 586300 2023-11-29 09:34:45,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3908653.3333333335, ans=0.125 2023-11-29 09:34:50,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3908653.3333333335, ans=0.125 2023-11-29 09:34:56,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3908720.0, ans=0.0 2023-11-29 09:35:06,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3908786.6666666665, ans=0.0 2023-11-29 09:35:09,884 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.85 vs. limit=6.0 2023-11-29 09:35:16,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3908786.6666666665, ans=0.0 2023-11-29 09:35:33,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3908920.0, ans=0.0 2023-11-29 09:35:44,168 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 9200, loss[loss=0.04834, simple_loss=0.06541, pruned_loss=0.006854, audio_tagging_loss=0.008778, over 15026.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08945, pruned_loss=0.01197, audio_tagging_loss=0.008372, over 3049399.93 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:35:44,331 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 586350 2023-11-29 09:36:09,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3909120.0, ans=0.125 2023-11-29 09:36:34,383 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.502e+01 8.956e+01 9.492e+01 1.042e+02 1.619e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-29 09:36:45,673 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 9250, loss[loss=0.04856, simple_loss=0.07028, pruned_loss=0.006439, audio_tagging_loss=0.006981, over 14528.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08864, pruned_loss=0.01179, audio_tagging_loss=0.008419, over 3056291.39 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:36:45,760 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 586400 2023-11-29 09:37:06,518 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.30 vs. limit=12.0 2023-11-29 09:37:17,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3909453.3333333335, ans=0.125 2023-11-29 09:37:31,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3909520.0, ans=0.1 2023-11-29 09:37:37,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3909586.6666666665, ans=0.0 2023-11-29 09:37:40,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=3909586.6666666665, ans=0.2 2023-11-29 09:37:47,416 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 9300, loss[loss=0.06407, simple_loss=0.09514, pruned_loss=0.008578, audio_tagging_loss=0.007916, over 14661.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08919, pruned_loss=0.01195, audio_tagging_loss=0.008473, over 3053456.26 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:37:47,522 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 586450 2023-11-29 09:37:58,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3909653.3333333335, ans=0.0 2023-11-29 09:37:59,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3909720.0, ans=0.1 2023-11-29 09:38:00,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3909720.0, ans=0.125 2023-11-29 09:38:08,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3909720.0, ans=0.0 2023-11-29 09:38:17,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3909786.6666666665, ans=0.2 2023-11-29 09:38:38,308 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.899e+01 9.042e+01 9.889e+01 1.074e+02 1.345e+02, threshold=1.978e+02, percent-clipped=0.0 2023-11-29 09:38:42,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3909920.0, ans=0.1 2023-11-29 09:38:49,397 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 9350, loss[loss=0.0528, simple_loss=0.07818, pruned_loss=0.007259, audio_tagging_loss=0.006455, over 15404.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.089, pruned_loss=0.01191, audio_tagging_loss=0.008499, over 3055177.06 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:38:49,536 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 586500 2023-11-29 09:39:02,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3910053.3333333335, ans=0.5 2023-11-29 09:39:12,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3910120.0, ans=0.125 2023-11-29 09:39:50,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3910320.0, ans=0.1 2023-11-29 09:39:51,653 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 9400, loss[loss=0.06165, simple_loss=0.09439, pruned_loss=0.009084, audio_tagging_loss=0.005367, over 14322.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08864, pruned_loss=0.01174, audio_tagging_loss=0.008543, over 3056982.12 frames. ], batch size: 53, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:39:51,757 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 586550 2023-11-29 09:40:21,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3910453.3333333335, ans=0.1 2023-11-29 09:40:30,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3910520.0, ans=0.125 2023-11-29 09:40:41,275 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=15.0 2023-11-29 09:40:42,640 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.237e+01 8.989e+01 9.545e+01 1.042e+02 1.282e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-29 09:40:52,792 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:40:53,904 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 9450, loss[loss=0.07047, simple_loss=0.09294, pruned_loss=0.01381, audio_tagging_loss=0.01019, over 16371.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08891, pruned_loss=0.01179, audio_tagging_loss=0.008609, over 3058740.53 frames. ], batch size: 62, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:40:54,016 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 586600 2023-11-29 09:41:16,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3910720.0, ans=0.2 2023-11-29 09:41:16,300 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:41:33,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3910853.3333333335, ans=0.0 2023-11-29 09:41:39,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3910853.3333333335, ans=0.07 2023-11-29 09:41:52,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3910920.0, ans=0.015 2023-11-29 09:41:55,147 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 9500, loss[loss=0.063, simple_loss=0.08676, pruned_loss=0.00874, audio_tagging_loss=0.01088, over 16368.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08814, pruned_loss=0.01164, audio_tagging_loss=0.008761, over 3054183.45 frames. ], batch size: 62, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:41:55,258 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 586650 2023-11-29 09:42:00,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3910986.6666666665, ans=0.1 2023-11-29 09:42:07,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3911053.3333333335, ans=0.125 2023-11-29 09:42:18,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3911120.0, ans=0.2 2023-11-29 09:42:20,667 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.41 vs. limit=6.0 2023-11-29 09:42:27,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3911120.0, ans=0.0 2023-11-29 09:42:34,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3911186.6666666665, ans=0.125 2023-11-29 09:42:34,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3911186.6666666665, ans=0.125 2023-11-29 09:42:45,166 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.983e+01 9.496e+01 1.015e+02 1.388e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-29 09:42:51,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3911253.3333333335, ans=0.0 2023-11-29 09:42:55,725 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 9550, loss[loss=0.06872, simple_loss=0.09937, pruned_loss=0.011, audio_tagging_loss=0.008027, over 16459.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08925, pruned_loss=0.01178, audio_tagging_loss=0.008789, over 3051550.02 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:42:55,835 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 586700 2023-11-29 09:42:57,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3911320.0, ans=0.125 2023-11-29 09:42:59,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3911320.0, ans=0.2 2023-11-29 09:43:10,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3911386.6666666665, ans=0.125 2023-11-29 09:43:12,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3911386.6666666665, ans=0.125 2023-11-29 09:43:17,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3911386.6666666665, ans=0.125 2023-11-29 09:43:23,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3911453.3333333335, ans=0.125 2023-11-29 09:43:33,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3911520.0, ans=0.125 2023-11-29 09:43:36,614 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.85 vs. limit=15.0 2023-11-29 09:43:47,135 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=15.0 2023-11-29 09:43:58,387 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 9600, loss[loss=0.06193, simple_loss=0.08629, pruned_loss=0.01077, audio_tagging_loss=0.00802, over 14669.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08995, pruned_loss=0.01185, audio_tagging_loss=0.008832, over 3053543.05 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:43:58,508 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 586750 2023-11-29 09:43:59,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3911653.3333333335, ans=0.04949747468305833 2023-11-29 09:44:00,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3911653.3333333335, ans=0.125 2023-11-29 09:44:10,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.27 vs. limit=22.5 2023-11-29 09:44:13,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3911720.0, ans=0.2 2023-11-29 09:44:32,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3911786.6666666665, ans=0.1 2023-11-29 09:44:49,670 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.510e+01 9.041e+01 9.598e+01 1.061e+02 1.358e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 09:45:00,437 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 9650, loss[loss=0.07399, simple_loss=0.09819, pruned_loss=0.01578, audio_tagging_loss=0.009113, over 15449.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08988, pruned_loss=0.01197, audio_tagging_loss=0.00875, over 3053715.67 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:45:00,526 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 586800 2023-11-29 09:45:10,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3911986.6666666665, ans=0.0 2023-11-29 09:45:14,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3912053.3333333335, ans=0.0 2023-11-29 09:45:22,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3912053.3333333335, ans=0.1 2023-11-29 09:45:40,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3912186.6666666665, ans=0.95 2023-11-29 09:45:44,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3912186.6666666665, ans=0.0 2023-11-29 09:46:01,553 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 9700, loss[loss=0.07291, simple_loss=0.1081, pruned_loss=0.0133, audio_tagging_loss=0.00557, over 14077.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08947, pruned_loss=0.01186, audio_tagging_loss=0.008625, over 3048752.19 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:46:01,674 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 586850 2023-11-29 09:46:03,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3912320.0, ans=0.5 2023-11-29 09:46:08,247 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2023-11-29 09:46:28,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3912453.3333333335, ans=0.125 2023-11-29 09:46:40,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3912520.0, ans=0.125 2023-11-29 09:46:48,927 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:46:52,803 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.44 vs. limit=10.0 2023-11-29 09:46:53,257 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.721e+01 9.022e+01 9.629e+01 1.062e+02 1.416e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 09:47:03,540 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 9750, loss[loss=0.07237, simple_loss=0.09975, pruned_loss=0.0138, audio_tagging_loss=0.008691, over 15623.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08875, pruned_loss=0.01181, audio_tagging_loss=0.008531, over 3047277.10 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:47:03,668 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 586900 2023-11-29 09:48:07,005 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 9800, loss[loss=0.0651, simple_loss=0.09081, pruned_loss=0.01192, audio_tagging_loss=0.007775, over 14551.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08833, pruned_loss=0.01177, audio_tagging_loss=0.008502, over 3051031.24 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:48:07,156 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 586950 2023-11-29 09:48:14,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3912986.6666666665, ans=0.125 2023-11-29 09:48:24,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3913053.3333333335, ans=0.125 2023-11-29 09:48:32,071 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.46 vs. limit=15.0 2023-11-29 09:48:48,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3913186.6666666665, ans=0.125 2023-11-29 09:48:53,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3913186.6666666665, ans=0.0 2023-11-29 09:48:59,767 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.885e+01 9.275e+01 1.009e+02 1.063e+02 1.388e+02, threshold=2.019e+02, percent-clipped=0.0 2023-11-29 09:49:02,236 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:49:08,179 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 9850, loss[loss=0.05878, simple_loss=0.07832, pruned_loss=0.009217, audio_tagging_loss=0.0104, over 15460.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08951, pruned_loss=0.01207, audio_tagging_loss=0.00843, over 3049033.93 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 8.0 2023-11-29 09:49:08,312 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 587000 2023-11-29 09:49:23,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3913386.6666666665, ans=0.0 2023-11-29 09:49:44,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3913453.3333333335, ans=0.1 2023-11-29 09:49:55,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3913520.0, ans=15.0 2023-11-29 09:50:10,936 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 9900, loss[loss=0.05593, simple_loss=0.07624, pruned_loss=0.0104, audio_tagging_loss=0.00742, over 15427.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.09061, pruned_loss=0.01218, audio_tagging_loss=0.008308, over 3049491.47 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 8.0 2023-11-29 09:50:11,082 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 587050 2023-11-29 09:50:26,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3913720.0, ans=0.125 2023-11-29 09:50:26,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3913720.0, ans=0.95 2023-11-29 09:50:28,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3913720.0, ans=0.0 2023-11-29 09:50:35,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3913786.6666666665, ans=0.125 2023-11-29 09:50:40,548 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.19 vs. limit=15.0 2023-11-29 09:51:02,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3913920.0, ans=0.1 2023-11-29 09:51:03,913 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 9.220e+01 9.633e+01 1.039e+02 1.360e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-29 09:51:05,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3913920.0, ans=0.125 2023-11-29 09:51:12,749 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 9950, loss[loss=0.04737, simple_loss=0.06663, pruned_loss=0.006836, audio_tagging_loss=0.00722, over 14130.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.09039, pruned_loss=0.01211, audio_tagging_loss=0.008342, over 3048169.29 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 8.0 2023-11-29 09:51:12,878 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 587100 2023-11-29 09:51:17,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3913986.6666666665, ans=0.1 2023-11-29 09:51:27,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3914053.3333333335, ans=0.125 2023-11-29 09:51:32,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3914053.3333333335, ans=0.2 2023-11-29 09:51:48,548 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.70 vs. limit=15.0 2023-11-29 09:51:53,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3914186.6666666665, ans=0.0 2023-11-29 09:51:54,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3914186.6666666665, ans=0.125 2023-11-29 09:52:09,905 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2023-11-29 09:52:13,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3914320.0, ans=0.1 2023-11-29 09:52:14,046 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 10000, loss[loss=0.05973, simple_loss=0.08451, pruned_loss=0.007549, audio_tagging_loss=0.009923, over 15665.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.09039, pruned_loss=0.01209, audio_tagging_loss=0.008326, over 3052476.06 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:52:14,163 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 587150 2023-11-29 09:52:48,222 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:52:48,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3914453.3333333335, ans=0.2 2023-11-29 09:53:07,081 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 9.076e+01 9.668e+01 1.049e+02 3.214e+02, threshold=1.934e+02, percent-clipped=1.0 2023-11-29 09:53:15,313 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 10050, loss[loss=0.05686, simple_loss=0.07288, pruned_loss=0.007514, audio_tagging_loss=0.01291, over 15094.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08943, pruned_loss=0.01203, audio_tagging_loss=0.008423, over 3056341.54 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:53:15,424 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 587200 2023-11-29 09:53:47,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3914786.6666666665, ans=0.125 2023-11-29 09:53:48,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3914786.6666666665, ans=0.1 2023-11-29 09:53:54,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3914853.3333333335, ans=0.2 2023-11-29 09:54:13,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3914920.0, ans=0.125 2023-11-29 09:54:16,940 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 10100, loss[loss=0.06126, simple_loss=0.0874, pruned_loss=0.009079, audio_tagging_loss=0.008474, over 15168.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08964, pruned_loss=0.01204, audio_tagging_loss=0.008498, over 3053997.00 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:54:17,074 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 587250 2023-11-29 09:54:42,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3915120.0, ans=0.2 2023-11-29 09:54:44,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3915120.0, ans=0.125 2023-11-29 09:55:02,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3915186.6666666665, ans=0.0 2023-11-29 09:55:06,704 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:55:10,584 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.962e+01 9.039e+01 9.666e+01 1.026e+02 1.279e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-29 09:55:10,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3915253.3333333335, ans=0.1 2023-11-29 09:55:19,744 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 10150, loss[loss=0.06161, simple_loss=0.08572, pruned_loss=0.009336, audio_tagging_loss=0.009415, over 15528.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08926, pruned_loss=0.01192, audio_tagging_loss=0.008571, over 3057467.05 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:55:19,841 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 587300 2023-11-29 09:55:39,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3915386.6666666665, ans=0.0 2023-11-29 09:55:39,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3915386.6666666665, ans=0.1 2023-11-29 09:55:48,730 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:56:02,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3915520.0, ans=0.1 2023-11-29 09:56:05,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3915520.0, ans=0.125 2023-11-29 09:56:20,714 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 10200, loss[loss=0.06087, simple_loss=0.08705, pruned_loss=0.01015, audio_tagging_loss=0.007196, over 15087.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08884, pruned_loss=0.01202, audio_tagging_loss=0.008641, over 3056879.17 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:56:20,818 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 587350 2023-11-29 09:56:34,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3915720.0, ans=0.125 2023-11-29 09:56:38,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3915720.0, ans=0.0 2023-11-29 09:56:44,821 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:57:00,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3915853.3333333335, ans=0.0 2023-11-29 09:57:04,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3915853.3333333335, ans=0.125 2023-11-29 09:57:07,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3915853.3333333335, ans=0.0 2023-11-29 09:57:14,097 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 8.990e+01 9.483e+01 1.035e+02 1.443e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-29 09:57:18,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3915920.0, ans=0.0 2023-11-29 09:57:22,311 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 10250, loss[loss=0.08045, simple_loss=0.1197, pruned_loss=0.01288, audio_tagging_loss=0.007735, over 15871.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08897, pruned_loss=0.01203, audio_tagging_loss=0.008668, over 3052582.77 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:57:22,440 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 587400 2023-11-29 09:57:34,270 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-11-29 09:57:52,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3916120.0, ans=0.0 2023-11-29 09:58:00,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3916186.6666666665, ans=0.1 2023-11-29 09:58:01,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=22.5 2023-11-29 09:58:09,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3916186.6666666665, ans=0.0 2023-11-29 09:58:10,765 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.79 vs. limit=6.0 2023-11-29 09:58:17,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3916253.3333333335, ans=0.125 2023-11-29 09:58:25,271 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 10300, loss[loss=0.06185, simple_loss=0.08512, pruned_loss=0.01032, audio_tagging_loss=0.008975, over 14761.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08854, pruned_loss=0.01192, audio_tagging_loss=0.008719, over 3052314.71 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:58:25,405 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 587450 2023-11-29 09:58:39,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3916386.6666666665, ans=0.0 2023-11-29 09:58:39,461 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=15.0 2023-11-29 09:58:41,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3916386.6666666665, ans=0.0 2023-11-29 09:58:57,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3916453.3333333335, ans=0.1 2023-11-29 09:59:15,234 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.78 vs. limit=12.0 2023-11-29 09:59:18,314 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.769e+01 9.346e+01 9.831e+01 1.081e+02 1.349e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-29 09:59:27,097 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 10350, loss[loss=0.06878, simple_loss=0.09903, pruned_loss=0.01094, audio_tagging_loss=0.008325, over 16316.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08963, pruned_loss=0.01206, audio_tagging_loss=0.008757, over 3050219.40 frames. ], batch size: 62, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:59:27,185 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 587500 2023-11-29 09:59:27,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3916653.3333333335, ans=0.025 2023-11-29 09:59:47,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.58 vs. limit=12.0 2023-11-29 10:00:05,640 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:00:08,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3916853.3333333335, ans=0.1 2023-11-29 10:00:16,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3916920.0, ans=0.125 2023-11-29 10:00:28,923 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 10400, loss[loss=0.06667, simple_loss=0.09122, pruned_loss=0.01132, audio_tagging_loss=0.009739, over 15358.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08916, pruned_loss=0.01197, audio_tagging_loss=0.008904, over 3047207.88 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:00:29,029 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 587550 2023-11-29 10:00:31,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3916986.6666666665, ans=0.07 2023-11-29 10:00:43,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3917053.3333333335, ans=0.5 2023-11-29 10:00:48,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3917053.3333333335, ans=0.125 2023-11-29 10:00:50,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3917053.3333333335, ans=0.5 2023-11-29 10:01:09,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3917186.6666666665, ans=0.0 2023-11-29 10:01:16,766 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=22.5 2023-11-29 10:01:22,227 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 9.130e+01 9.665e+01 1.040e+02 1.258e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-29 10:01:22,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3917253.3333333335, ans=0.07 2023-11-29 10:01:31,594 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 10450, loss[loss=0.056, simple_loss=0.07236, pruned_loss=0.01121, audio_tagging_loss=0.00861, over 14484.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.0889, pruned_loss=0.01193, audio_tagging_loss=0.008905, over 3041511.95 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:01:31,692 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 587600 2023-11-29 10:01:57,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3917453.3333333335, ans=22.5 2023-11-29 10:02:01,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3917453.3333333335, ans=0.125 2023-11-29 10:02:08,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3917520.0, ans=0.125 2023-11-29 10:02:21,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3917586.6666666665, ans=0.0 2023-11-29 10:02:28,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3917586.6666666665, ans=0.0 2023-11-29 10:02:32,871 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2023-11-29 10:02:33,194 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 10500, loss[loss=0.06547, simple_loss=0.08101, pruned_loss=0.01629, audio_tagging_loss=0.008675, over 14548.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08861, pruned_loss=0.01184, audio_tagging_loss=0.008788, over 3046016.05 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:02:33,381 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 587650 2023-11-29 10:02:33,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3917653.3333333335, ans=0.125 2023-11-29 10:02:35,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3917653.3333333335, ans=15.0 2023-11-29 10:02:50,804 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2023-11-29 10:03:03,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3917786.6666666665, ans=0.125 2023-11-29 10:03:04,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3917786.6666666665, ans=0.1 2023-11-29 10:03:17,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=12.0 2023-11-29 10:03:26,856 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.916e+01 9.691e+01 1.026e+02 1.437e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 10:03:27,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3917920.0, ans=0.125 2023-11-29 10:03:32,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3917920.0, ans=0.0 2023-11-29 10:03:35,753 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 10550, loss[loss=0.0587, simple_loss=0.08127, pruned_loss=0.008742, audio_tagging_loss=0.009325, over 16791.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08828, pruned_loss=0.01204, audio_tagging_loss=0.00872, over 3047422.06 frames. ], batch size: 63, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:03:35,916 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 587700 2023-11-29 10:03:38,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3917986.6666666665, ans=0.5 2023-11-29 10:03:44,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3917986.6666666665, ans=0.1 2023-11-29 10:03:44,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3917986.6666666665, ans=0.1 2023-11-29 10:03:45,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3917986.6666666665, ans=0.0 2023-11-29 10:04:00,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3918120.0, ans=0.125 2023-11-29 10:04:38,652 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 10600, loss[loss=0.06959, simple_loss=0.09241, pruned_loss=0.014, audio_tagging_loss=0.009389, over 15952.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08902, pruned_loss=0.01206, audio_tagging_loss=0.008517, over 3037502.07 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:04:38,752 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 587750 2023-11-29 10:04:50,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3918386.6666666665, ans=0.125 2023-11-29 10:05:02,913 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.48 vs. limit=15.0 2023-11-29 10:05:07,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3918453.3333333335, ans=0.125 2023-11-29 10:05:13,748 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.40 vs. limit=6.0 2023-11-29 10:05:15,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3918520.0, ans=0.125 2023-11-29 10:05:31,799 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.005e+01 8.953e+01 9.680e+01 1.038e+02 1.330e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-29 10:05:40,136 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 10650, loss[loss=0.04825, simple_loss=0.07122, pruned_loss=0.006264, audio_tagging_loss=0.006373, over 15647.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08906, pruned_loss=0.01204, audio_tagging_loss=0.008482, over 3044286.67 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:05:40,243 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 587800 2023-11-29 10:05:42,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.86 vs. limit=22.5 2023-11-29 10:05:43,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3918653.3333333335, ans=0.0 2023-11-29 10:05:48,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3918653.3333333335, ans=0.125 2023-11-29 10:05:50,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3918653.3333333335, ans=0.125 2023-11-29 10:05:55,208 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-29 10:06:04,136 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:06:08,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3918786.6666666665, ans=0.125 2023-11-29 10:06:13,844 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.53 vs. limit=10.0 2023-11-29 10:06:20,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3918853.3333333335, ans=0.1 2023-11-29 10:06:26,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3918853.3333333335, ans=0.125 2023-11-29 10:06:42,296 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 10700, loss[loss=0.06385, simple_loss=0.083, pruned_loss=0.01282, audio_tagging_loss=0.009531, over 15165.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08935, pruned_loss=0.0121, audio_tagging_loss=0.008424, over 3046484.37 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:06:42,398 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 587850 2023-11-29 10:07:03,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3919053.3333333335, ans=0.5 2023-11-29 10:07:06,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3919120.0, ans=0.125 2023-11-29 10:07:06,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3919120.0, ans=0.125 2023-11-29 10:07:17,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3919186.6666666665, ans=0.1 2023-11-29 10:07:20,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3919186.6666666665, ans=0.0 2023-11-29 10:07:24,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3919186.6666666665, ans=0.0 2023-11-29 10:07:25,425 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.59 vs. limit=8.0 2023-11-29 10:07:36,073 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.832e+01 9.064e+01 9.584e+01 1.018e+02 1.338e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 10:07:38,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3919253.3333333335, ans=0.125 2023-11-29 10:07:44,309 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 10750, loss[loss=0.06197, simple_loss=0.07855, pruned_loss=0.009389, audio_tagging_loss=0.01331, over 15952.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08895, pruned_loss=0.012, audio_tagging_loss=0.008459, over 3051296.02 frames. ], batch size: 62, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:07:44,434 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 587900 2023-11-29 10:07:49,629 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.83 vs. limit=22.5 2023-11-29 10:07:50,496 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:07:53,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3919320.0, ans=0.04949747468305833 2023-11-29 10:07:54,462 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.74 vs. limit=15.0 2023-11-29 10:08:19,047 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.44 vs. limit=15.0 2023-11-29 10:08:42,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3919586.6666666665, ans=0.2 2023-11-29 10:08:44,886 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 10800, loss[loss=0.04768, simple_loss=0.06239, pruned_loss=0.005379, audio_tagging_loss=0.01111, over 16225.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08971, pruned_loss=0.0121, audio_tagging_loss=0.008432, over 3053627.31 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:08:45,005 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 587950 2023-11-29 10:08:46,752 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.16 vs. limit=10.0 2023-11-29 10:09:24,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3919853.3333333335, ans=0.125 2023-11-29 10:09:37,585 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2023-11-29 10:09:39,273 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.936e+01 9.085e+01 9.620e+01 1.015e+02 1.229e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 10:09:47,013 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 10850, loss[loss=0.07411, simple_loss=0.1007, pruned_loss=0.01726, audio_tagging_loss=0.00648, over 15937.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08894, pruned_loss=0.01202, audio_tagging_loss=0.008439, over 3051900.96 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:09:47,125 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 588000 2023-11-29 10:09:48,647 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-588000.pt 2023-11-29 10:10:11,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3920053.3333333335, ans=0.2 2023-11-29 10:10:11,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3920053.3333333335, ans=0.05 2023-11-29 10:10:39,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3920253.3333333335, ans=0.0 2023-11-29 10:10:42,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3920253.3333333335, ans=0.0 2023-11-29 10:10:42,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3920253.3333333335, ans=0.125 2023-11-29 10:10:45,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3920253.3333333335, ans=0.125 2023-11-29 10:10:48,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3920253.3333333335, ans=0.125 2023-11-29 10:10:49,011 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 10:10:51,860 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 10900, loss[loss=0.06155, simple_loss=0.0732, pruned_loss=0.01157, audio_tagging_loss=0.01339, over 14558.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08854, pruned_loss=0.01189, audio_tagging_loss=0.008578, over 3049156.39 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:10:51,951 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 588050 2023-11-29 10:10:55,188 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:10:57,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3920320.0, ans=0.1 2023-11-29 10:11:26,263 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.10 vs. limit=15.0 2023-11-29 10:11:47,356 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.916e+01 9.221e+01 9.905e+01 1.064e+02 1.744e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-29 10:11:51,671 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2023-11-29 10:11:53,349 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 10950, loss[loss=0.05524, simple_loss=0.08174, pruned_loss=0.007774, audio_tagging_loss=0.0066, over 15463.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08845, pruned_loss=0.01174, audio_tagging_loss=0.008581, over 3047453.62 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:11:53,483 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 588100 2023-11-29 10:12:00,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3920653.3333333335, ans=0.1 2023-11-29 10:12:01,236 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.91 vs. limit=22.5 2023-11-29 10:12:27,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3920786.6666666665, ans=0.125 2023-11-29 10:12:27,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3920786.6666666665, ans=0.125 2023-11-29 10:12:44,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3920920.0, ans=0.0 2023-11-29 10:12:48,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3920920.0, ans=0.125 2023-11-29 10:12:54,895 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 11000, loss[loss=0.06884, simple_loss=0.09502, pruned_loss=0.01109, audio_tagging_loss=0.01024, over 16563.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08948, pruned_loss=0.01181, audio_tagging_loss=0.008598, over 3052508.05 frames. ], batch size: 62, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:12:55,079 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 588150 2023-11-29 10:13:01,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3920986.6666666665, ans=10.0 2023-11-29 10:13:04,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3920986.6666666665, ans=0.125 2023-11-29 10:13:06,296 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 10:13:37,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3921186.6666666665, ans=0.0 2023-11-29 10:13:45,671 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.18 vs. limit=10.0 2023-11-29 10:13:50,734 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.176e+01 9.054e+01 9.771e+01 1.053e+02 1.277e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 10:13:56,543 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 11050, loss[loss=0.06697, simple_loss=0.09259, pruned_loss=0.01275, audio_tagging_loss=0.00793, over 15098.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08958, pruned_loss=0.01178, audio_tagging_loss=0.008615, over 3054524.29 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:13:56,641 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 588200 2023-11-29 10:14:18,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3921386.6666666665, ans=0.2 2023-11-29 10:14:22,238 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.88 vs. limit=12.0 2023-11-29 10:14:42,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3921520.0, ans=0.125 2023-11-29 10:14:42,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3921520.0, ans=0.125 2023-11-29 10:14:44,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.50 vs. limit=15.0 2023-11-29 10:14:58,984 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 11100, loss[loss=0.06935, simple_loss=0.09378, pruned_loss=0.01169, audio_tagging_loss=0.01077, over 15608.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08825, pruned_loss=0.01159, audio_tagging_loss=0.008766, over 3044860.52 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:14:59,070 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 588250 2023-11-29 10:15:05,602 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=15.0 2023-11-29 10:15:46,494 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:15:53,233 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.915e+01 9.126e+01 9.732e+01 1.035e+02 1.401e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-29 10:15:59,065 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 11150, loss[loss=0.07914, simple_loss=0.1068, pruned_loss=0.01783, audio_tagging_loss=0.007903, over 15008.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08904, pruned_loss=0.01181, audio_tagging_loss=0.008812, over 3045704.45 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:15:59,195 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 588300 2023-11-29 10:16:16,603 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.15 vs. limit=15.0 2023-11-29 10:16:17,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3922053.3333333335, ans=0.125 2023-11-29 10:16:47,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3922253.3333333335, ans=0.0 2023-11-29 10:16:57,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3922253.3333333335, ans=0.125 2023-11-29 10:17:00,574 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 11200, loss[loss=0.08484, simple_loss=0.1206, pruned_loss=0.01596, audio_tagging_loss=0.008587, over 15434.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08869, pruned_loss=0.01165, audio_tagging_loss=0.008837, over 3042534.10 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:17:00,716 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 588350 2023-11-29 10:17:11,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3922320.0, ans=0.125 2023-11-29 10:17:22,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.35 vs. limit=22.5 2023-11-29 10:17:31,416 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:17:56,308 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.891e+01 9.269e+01 9.742e+01 1.059e+02 1.357e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-29 10:18:02,852 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 11250, loss[loss=0.07225, simple_loss=0.1031, pruned_loss=0.01318, audio_tagging_loss=0.00752, over 14547.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08848, pruned_loss=0.01161, audio_tagging_loss=0.008732, over 3048603.92 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:18:02,964 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 588400 2023-11-29 10:18:33,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3922786.6666666665, ans=0.125 2023-11-29 10:19:03,736 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 11300, loss[loss=0.07058, simple_loss=0.1021, pruned_loss=0.01324, audio_tagging_loss=0.006275, over 15508.00 frames. ], tot_loss[loss=0.06407, simple_loss=0.08787, pruned_loss=0.0115, audio_tagging_loss=0.008632, over 3049738.68 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:19:03,842 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 588450 2023-11-29 10:19:26,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3923053.3333333335, ans=0.125 2023-11-29 10:19:33,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3923120.0, ans=0.0 2023-11-29 10:19:41,110 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=12.0 2023-11-29 10:19:43,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3923186.6666666665, ans=0.125 2023-11-29 10:19:56,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3923253.3333333335, ans=0.0 2023-11-29 10:19:58,517 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 9.209e+01 9.911e+01 1.088e+02 1.767e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-29 10:20:02,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3923253.3333333335, ans=0.125 2023-11-29 10:20:04,405 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 11350, loss[loss=0.07914, simple_loss=0.1153, pruned_loss=0.01446, audio_tagging_loss=0.007034, over 15970.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08841, pruned_loss=0.01174, audio_tagging_loss=0.008556, over 3046359.00 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:20:04,516 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 588500 2023-11-29 10:20:22,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3923386.6666666665, ans=0.04949747468305833 2023-11-29 10:20:41,883 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:20:49,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3923520.0, ans=0.0 2023-11-29 10:21:06,500 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 11400, loss[loss=0.08931, simple_loss=0.13, pruned_loss=0.01703, audio_tagging_loss=0.007257, over 15719.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.0893, pruned_loss=0.01173, audio_tagging_loss=0.00843, over 3050944.93 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:21:06,657 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 588550 2023-11-29 10:21:09,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3923653.3333333335, ans=0.125 2023-11-29 10:21:09,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3923653.3333333335, ans=0.09899494936611666 2023-11-29 10:21:37,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3923786.6666666665, ans=0.0 2023-11-29 10:21:39,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3923786.6666666665, ans=0.0 2023-11-29 10:21:57,638 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2023-11-29 10:22:01,949 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.509e+01 8.974e+01 9.585e+01 1.035e+02 2.029e+02, threshold=1.917e+02, percent-clipped=1.0 2023-11-29 10:22:07,795 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 11450, loss[loss=0.07383, simple_loss=0.1094, pruned_loss=0.01175, audio_tagging_loss=0.007395, over 15936.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.0898, pruned_loss=0.01182, audio_tagging_loss=0.008356, over 3050667.53 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:22:07,914 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 588600 2023-11-29 10:22:11,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3923986.6666666665, ans=0.125 2023-11-29 10:22:11,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3923986.6666666665, ans=0.0 2023-11-29 10:22:15,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3923986.6666666665, ans=0.125 2023-11-29 10:22:39,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3924120.0, ans=0.0 2023-11-29 10:22:57,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3924253.3333333335, ans=0.1 2023-11-29 10:22:59,204 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.25 vs. limit=22.5 2023-11-29 10:23:08,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3924320.0, ans=0.125 2023-11-29 10:23:09,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3924320.0, ans=0.0 2023-11-29 10:23:09,770 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 11500, loss[loss=0.05644, simple_loss=0.06908, pruned_loss=0.0116, audio_tagging_loss=0.01031, over 14636.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08854, pruned_loss=0.01168, audio_tagging_loss=0.008404, over 3047865.89 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:23:09,893 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 588650 2023-11-29 10:23:15,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=3924320.0, ans=12.0 2023-11-29 10:23:17,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3924320.0, ans=0.0 2023-11-29 10:23:22,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3924386.6666666665, ans=0.0 2023-11-29 10:23:27,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3924386.6666666665, ans=0.05 2023-11-29 10:23:33,404 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:23:53,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3924520.0, ans=0.0 2023-11-29 10:23:56,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3924520.0, ans=0.125 2023-11-29 10:24:06,304 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.035e+01 8.997e+01 9.583e+01 1.037e+02 1.357e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 10:24:11,592 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 11550, loss[loss=0.07025, simple_loss=0.1028, pruned_loss=0.01235, audio_tagging_loss=0.006481, over 15402.00 frames. ], tot_loss[loss=0.06416, simple_loss=0.08807, pruned_loss=0.01162, audio_tagging_loss=0.008502, over 3049246.89 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:24:11,728 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 588700 2023-11-29 10:24:25,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3924720.0, ans=0.0 2023-11-29 10:24:33,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3924720.0, ans=0.125 2023-11-29 10:24:41,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3924786.6666666665, ans=0.95 2023-11-29 10:24:41,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3924786.6666666665, ans=0.2 2023-11-29 10:24:46,303 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.60 vs. limit=15.0 2023-11-29 10:24:47,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3924853.3333333335, ans=0.125 2023-11-29 10:24:47,227 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.72 vs. limit=15.0 2023-11-29 10:24:48,818 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2023-11-29 10:24:49,087 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 10:24:55,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3924853.3333333335, ans=0.0 2023-11-29 10:24:55,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3924853.3333333335, ans=0.125 2023-11-29 10:25:04,851 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.22 vs. limit=15.0 2023-11-29 10:25:12,265 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 11600, loss[loss=0.05951, simple_loss=0.08022, pruned_loss=0.0127, audio_tagging_loss=0.006699, over 14236.00 frames. ], tot_loss[loss=0.06391, simple_loss=0.08757, pruned_loss=0.01161, audio_tagging_loss=0.008516, over 3048021.92 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:25:13,052 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 588750 2023-11-29 10:25:14,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3924986.6666666665, ans=0.125 2023-11-29 10:25:45,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3925120.0, ans=0.0 2023-11-29 10:25:45,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3925120.0, ans=0.125 2023-11-29 10:25:47,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3925120.0, ans=0.1 2023-11-29 10:26:09,242 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.110e+01 9.153e+01 9.919e+01 1.070e+02 2.477e+02, threshold=1.984e+02, percent-clipped=1.0 2023-11-29 10:26:11,126 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.89 vs. limit=15.0 2023-11-29 10:26:14,600 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 11650, loss[loss=0.0855, simple_loss=0.1171, pruned_loss=0.0184, audio_tagging_loss=0.00857, over 16050.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08889, pruned_loss=0.01205, audio_tagging_loss=0.008455, over 3049983.70 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:26:14,699 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 588800 2023-11-29 10:26:34,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3925386.6666666665, ans=0.125 2023-11-29 10:27:17,052 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 11700, loss[loss=0.08014, simple_loss=0.1136, pruned_loss=0.01655, audio_tagging_loss=0.006764, over 16273.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08811, pruned_loss=0.01183, audio_tagging_loss=0.00857, over 3048411.55 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:27:17,190 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 588850 2023-11-29 10:27:46,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3925786.6666666665, ans=0.2 2023-11-29 10:27:57,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3925853.3333333335, ans=0.125 2023-11-29 10:27:58,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3925853.3333333335, ans=0.125 2023-11-29 10:28:14,513 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.245e+01 9.126e+01 9.606e+01 1.033e+02 1.375e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-29 10:28:17,925 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 11750, loss[loss=0.05482, simple_loss=0.06148, pruned_loss=0.01247, audio_tagging_loss=0.01161, over 14433.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.0878, pruned_loss=0.01184, audio_tagging_loss=0.008625, over 3045805.20 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:28:18,055 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 588900 2023-11-29 10:28:22,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3925986.6666666665, ans=0.025 2023-11-29 10:28:29,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3926053.3333333335, ans=0.125 2023-11-29 10:28:45,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3926120.0, ans=0.07 2023-11-29 10:29:12,966 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-29 10:29:20,678 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 11800, loss[loss=0.08328, simple_loss=0.1153, pruned_loss=0.01572, audio_tagging_loss=0.009892, over 15744.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08848, pruned_loss=0.01186, audio_tagging_loss=0.008608, over 3050629.42 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:29:20,780 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 588950 2023-11-29 10:29:43,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3926386.6666666665, ans=0.125 2023-11-29 10:29:46,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3926453.3333333335, ans=0.025 2023-11-29 10:29:47,405 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=15.0 2023-11-29 10:30:00,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3926520.0, ans=0.1 2023-11-29 10:30:11,452 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:30:12,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3926586.6666666665, ans=0.1 2023-11-29 10:30:18,162 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 9.043e+01 9.603e+01 1.049e+02 1.292e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-29 10:30:21,753 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 11850, loss[loss=0.07369, simple_loss=0.1122, pruned_loss=0.01207, audio_tagging_loss=0.005541, over 14724.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08851, pruned_loss=0.0119, audio_tagging_loss=0.008671, over 3048140.02 frames. ], batch size: 53, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:30:21,888 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 589000 2023-11-29 10:30:38,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3926720.0, ans=0.05 2023-11-29 10:30:39,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3926720.0, ans=0.125 2023-11-29 10:30:49,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3926786.6666666665, ans=0.0 2023-11-29 10:30:54,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3926786.6666666665, ans=0.125 2023-11-29 10:31:13,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3926920.0, ans=0.2 2023-11-29 10:31:22,944 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 11900, loss[loss=0.0813, simple_loss=0.1095, pruned_loss=0.01823, audio_tagging_loss=0.008346, over 14464.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08868, pruned_loss=0.01199, audio_tagging_loss=0.008702, over 3044442.11 frames. ], batch size: 53, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:31:23,068 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 589050 2023-11-29 10:31:24,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3926986.6666666665, ans=0.07 2023-11-29 10:31:33,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3927053.3333333335, ans=0.125 2023-11-29 10:32:11,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3927253.3333333335, ans=0.125 2023-11-29 10:32:12,528 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-11-29 10:32:15,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3927253.3333333335, ans=0.1 2023-11-29 10:32:18,727 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 9.048e+01 9.528e+01 1.019e+02 1.404e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-29 10:32:18,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3927253.3333333335, ans=0.035 2023-11-29 10:32:22,325 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 11950, loss[loss=0.06365, simple_loss=0.09609, pruned_loss=0.009168, audio_tagging_loss=0.006434, over 15406.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08806, pruned_loss=0.01192, audio_tagging_loss=0.008814, over 3047388.82 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:32:22,458 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 589100 2023-11-29 10:32:29,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3927320.0, ans=0.125 2023-11-29 10:32:40,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3927386.6666666665, ans=0.125 2023-11-29 10:33:02,207 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.94 vs. limit=15.0 2023-11-29 10:33:19,845 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.27 vs. limit=22.5 2023-11-29 10:33:21,421 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 12000, loss[loss=0.06153, simple_loss=0.08133, pruned_loss=0.008914, audio_tagging_loss=0.01195, over 14363.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08795, pruned_loss=0.01186, audio_tagging_loss=0.008833, over 3046110.86 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:33:21,423 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-29 10:33:53,257 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8040, 4.9412, 5.1472, 4.9298], device='cuda:0') 2023-11-29 10:34:01,196 INFO [train_asr.py:1267] (0/4) Epoch 49, validation: loss=0.0581, simple_loss=0.05045, pruned_loss=0.005444, audio_tagging_loss=0.02743, over 4681554.00 frames. 2023-11-29 10:34:01,197 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-29 10:34:01,298 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 589150 2023-11-29 10:34:21,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3927720.0, ans=0.2 2023-11-29 10:34:27,434 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-49.pt 2023-11-29 10:34:46,427 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 0, loss[loss=0.07758, simple_loss=0.0851, pruned_loss=0.01177, audio_tagging_loss=0.02325, over 14503.00 frames. ], tot_loss[loss=0.07758, simple_loss=0.0851, pruned_loss=0.01177, audio_tagging_loss=0.02325, over 14503.00 frames. ], batch size: 54, lr: 1.36e-03, grad_scale: 32.0 2023-11-29 10:34:46,430 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-29 10:35:03,450 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1642, 4.5844, 5.2166, 4.8490], device='cuda:0') 2023-11-29 10:35:22,085 INFO [train_asr.py:1267] (0/4) Epoch 50, validation: loss=0.05785, simple_loss=0.05049, pruned_loss=0.005519, audio_tagging_loss=0.02709, over 4681554.00 frames. 2023-11-29 10:35:22,086 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-29 10:35:43,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3927873.3333333335, ans=0.0 2023-11-29 10:35:47,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3927940.0, ans=0.0 2023-11-29 10:35:54,864 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.763e+01 9.469e+01 1.029e+02 1.110e+02 1.447e+02, threshold=2.058e+02, percent-clipped=0.0 2023-11-29 10:35:57,205 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 589200 2023-11-29 10:35:57,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3927940.0, ans=0.0 2023-11-29 10:36:09,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3928006.6666666665, ans=0.0 2023-11-29 10:36:25,873 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 50, loss[loss=0.07817, simple_loss=0.1009, pruned_loss=0.0146, audio_tagging_loss=0.01311, over 14993.00 frames. ], tot_loss[loss=0.07531, simple_loss=0.09277, pruned_loss=0.01242, audio_tagging_loss=0.0165, over 691383.72 frames. ], batch size: 56, lr: 1.36e-03, grad_scale: 16.0 2023-11-29 10:37:00,006 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 589250 2023-11-29 10:37:04,390 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.06 vs. limit=15.0 2023-11-29 10:37:12,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3928340.0, ans=0.125 2023-11-29 10:37:28,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3928473.3333333335, ans=0.0 2023-11-29 10:37:29,738 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 100, loss[loss=0.05613, simple_loss=0.06182, pruned_loss=0.006694, audio_tagging_loss=0.01852, over 15137.00 frames. ], tot_loss[loss=0.07368, simple_loss=0.09192, pruned_loss=0.01199, audio_tagging_loss=0.01573, over 1209885.41 frames. ], batch size: 57, lr: 1.36e-03, grad_scale: 16.0 2023-11-29 10:37:39,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3928473.3333333335, ans=0.125 2023-11-29 10:37:42,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=22.5 2023-11-29 10:38:00,143 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.044e+01 1.010e+02 1.060e+02 1.133e+02 1.839e+02, threshold=2.120e+02, percent-clipped=0.0 2023-11-29 10:38:02,621 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 589300 2023-11-29 10:38:06,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3928673.3333333335, ans=0.0 2023-11-29 10:38:06,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3928673.3333333335, ans=0.0 2023-11-29 10:38:07,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3928673.3333333335, ans=0.125 2023-11-29 10:38:20,456 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.31 vs. limit=22.5 2023-11-29 10:38:31,795 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 150, loss[loss=0.06968, simple_loss=0.09463, pruned_loss=0.01355, audio_tagging_loss=0.008811, over 16703.00 frames. ], tot_loss[loss=0.07103, simple_loss=0.08984, pruned_loss=0.01197, audio_tagging_loss=0.01415, over 1617558.20 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:38:38,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3928806.6666666665, ans=0.0 2023-11-29 10:39:05,983 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 589350 2023-11-29 10:39:13,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3929006.6666666665, ans=0.125 2023-11-29 10:39:30,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3929073.3333333335, ans=0.125 2023-11-29 10:39:34,088 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 200, loss[loss=0.0694, simple_loss=0.08996, pruned_loss=0.01493, audio_tagging_loss=0.009482, over 15009.00 frames. ], tot_loss[loss=0.06961, simple_loss=0.09009, pruned_loss=0.01201, audio_tagging_loss=0.01255, over 1936022.31 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:39:48,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3929206.6666666665, ans=0.0 2023-11-29 10:40:02,303 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.86 vs. limit=6.0 2023-11-29 10:40:05,152 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 9.198e+01 9.931e+01 1.061e+02 1.225e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-29 10:40:07,809 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 589400 2023-11-29 10:40:10,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3929340.0, ans=0.125 2023-11-29 10:40:12,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3929340.0, ans=0.125 2023-11-29 10:40:15,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3929340.0, ans=0.125 2023-11-29 10:40:18,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.19 vs. limit=10.0 2023-11-29 10:40:31,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3929406.6666666665, ans=0.0 2023-11-29 10:40:34,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3929406.6666666665, ans=0.0 2023-11-29 10:40:36,864 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 250, loss[loss=0.05519, simple_loss=0.06855, pruned_loss=0.008695, audio_tagging_loss=0.01222, over 15069.00 frames. ], tot_loss[loss=0.06866, simple_loss=0.09037, pruned_loss=0.01209, audio_tagging_loss=0.01138, over 2181395.44 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:40:40,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3929473.3333333335, ans=0.125 2023-11-29 10:40:54,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3929540.0, ans=0.95 2023-11-29 10:40:55,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3929540.0, ans=0.0 2023-11-29 10:40:55,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3929540.0, ans=0.125 2023-11-29 10:40:58,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3929540.0, ans=0.2 2023-11-29 10:41:07,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3929606.6666666665, ans=0.1 2023-11-29 10:41:07,237 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=22.5 2023-11-29 10:41:11,107 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 589450 2023-11-29 10:41:11,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3929606.6666666665, ans=0.125 2023-11-29 10:41:29,822 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:41:35,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3929740.0, ans=0.125 2023-11-29 10:41:40,653 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 300, loss[loss=0.06552, simple_loss=0.09244, pruned_loss=0.01141, audio_tagging_loss=0.007893, over 14591.00 frames. ], tot_loss[loss=0.06817, simple_loss=0.09103, pruned_loss=0.01218, audio_tagging_loss=0.01047, over 2368198.81 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:41:43,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3929806.6666666665, ans=10.0 2023-11-29 10:41:55,662 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.80 vs. limit=15.0 2023-11-29 10:42:06,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3929940.0, ans=0.0 2023-11-29 10:42:07,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3929940.0, ans=0.125 2023-11-29 10:42:11,435 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.312e+01 9.170e+01 9.850e+01 1.054e+02 1.427e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 10:42:13,967 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 589500 2023-11-29 10:42:28,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3930006.6666666665, ans=0.125 2023-11-29 10:42:34,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3930073.3333333335, ans=0.125 2023-11-29 10:42:42,015 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.66 vs. limit=15.0 2023-11-29 10:42:42,546 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 350, loss[loss=0.07034, simple_loss=0.09222, pruned_loss=0.01368, audio_tagging_loss=0.01055, over 14559.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09078, pruned_loss=0.01226, audio_tagging_loss=0.009975, over 2515558.88 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:42:57,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3930206.6666666665, ans=0.125 2023-11-29 10:43:16,934 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 589550 2023-11-29 10:43:18,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3930273.3333333335, ans=0.0 2023-11-29 10:43:23,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3930340.0, ans=0.0 2023-11-29 10:43:28,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3930340.0, ans=0.125 2023-11-29 10:43:42,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3930406.6666666665, ans=0.2 2023-11-29 10:43:44,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.08 vs. limit=22.5 2023-11-29 10:43:44,390 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 400, loss[loss=0.07544, simple_loss=0.1016, pruned_loss=0.0169, audio_tagging_loss=0.007731, over 14599.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09093, pruned_loss=0.01223, audio_tagging_loss=0.00965, over 2637094.87 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:44:16,517 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.818e+01 9.147e+01 9.646e+01 1.038e+02 1.524e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 10:44:18,435 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 589600 2023-11-29 10:44:45,139 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.61 vs. limit=15.0 2023-11-29 10:44:45,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3930740.0, ans=0.125 2023-11-29 10:44:47,875 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 450, loss[loss=0.05845, simple_loss=0.07314, pruned_loss=0.01154, audio_tagging_loss=0.01035, over 14236.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09091, pruned_loss=0.01225, audio_tagging_loss=0.009324, over 2721262.67 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:45:20,672 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 589650 2023-11-29 10:45:29,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3931006.6666666665, ans=0.0 2023-11-29 10:45:34,118 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2023-11-29 10:45:36,609 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.92 vs. limit=15.0 2023-11-29 10:45:44,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3931073.3333333335, ans=0.125 2023-11-29 10:45:46,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3931073.3333333335, ans=0.125 2023-11-29 10:45:48,722 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 500, loss[loss=0.06732, simple_loss=0.09184, pruned_loss=0.01372, audio_tagging_loss=0.007677, over 15029.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09081, pruned_loss=0.01224, audio_tagging_loss=0.009106, over 2787920.47 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:46:21,785 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 9.167e+01 9.711e+01 1.057e+02 1.221e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-29 10:46:23,074 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 589700 2023-11-29 10:46:42,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3931406.6666666665, ans=0.0 2023-11-29 10:46:49,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3931473.3333333335, ans=0.5 2023-11-29 10:46:50,322 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 550, loss[loss=0.05466, simple_loss=0.06742, pruned_loss=0.01073, audio_tagging_loss=0.01023, over 14740.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09046, pruned_loss=0.01215, audio_tagging_loss=0.009032, over 2842458.79 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:47:22,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3931606.6666666665, ans=0.2 2023-11-29 10:47:23,936 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 589750 2023-11-29 10:47:50,827 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.58 vs. limit=12.0 2023-11-29 10:47:52,396 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 600, loss[loss=0.06549, simple_loss=0.09163, pruned_loss=0.01235, audio_tagging_loss=0.007331, over 15204.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09118, pruned_loss=0.01225, audio_tagging_loss=0.008812, over 2890657.21 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:48:08,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3931873.3333333335, ans=0.2 2023-11-29 10:48:21,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3931940.0, ans=0.1 2023-11-29 10:48:24,205 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.204e+01 8.984e+01 9.771e+01 1.059e+02 2.081e+02, threshold=1.954e+02, percent-clipped=1.0 2023-11-29 10:48:25,492 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 589800 2023-11-29 10:48:42,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3932073.3333333335, ans=0.125 2023-11-29 10:48:48,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.50 vs. limit=15.0 2023-11-29 10:48:51,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3932073.3333333335, ans=0.125 2023-11-29 10:48:52,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3932073.3333333335, ans=0.125 2023-11-29 10:48:54,143 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 650, loss[loss=0.07804, simple_loss=0.1125, pruned_loss=0.01398, audio_tagging_loss=0.007806, over 14564.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09215, pruned_loss=0.01233, audio_tagging_loss=0.008733, over 2930944.28 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:48:57,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3932140.0, ans=0.0 2023-11-29 10:49:22,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3932273.3333333335, ans=0.125 2023-11-29 10:49:27,367 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 589850 2023-11-29 10:49:29,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3932340.0, ans=0.1 2023-11-29 10:49:31,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3932340.0, ans=0.0 2023-11-29 10:49:48,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3932406.6666666665, ans=0.0 2023-11-29 10:49:54,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2023-11-29 10:49:55,242 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 700, loss[loss=0.0561, simple_loss=0.07085, pruned_loss=0.01197, audio_tagging_loss=0.008705, over 14208.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09087, pruned_loss=0.01214, audio_tagging_loss=0.0087, over 2958302.03 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:50:02,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3932473.3333333335, ans=0.1 2023-11-29 10:50:02,593 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.09 vs. limit=10.0 2023-11-29 10:50:15,726 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.03 vs. limit=15.0 2023-11-29 10:50:26,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3932606.6666666665, ans=0.125 2023-11-29 10:50:27,460 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.376e+01 9.061e+01 9.779e+01 1.049e+02 1.414e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-29 10:50:28,827 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 589900 2023-11-29 10:50:34,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3932673.3333333335, ans=0.04949747468305833 2023-11-29 10:50:42,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3932673.3333333335, ans=0.0 2023-11-29 10:50:54,784 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.17 vs. limit=15.0 2023-11-29 10:50:57,758 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 750, loss[loss=0.07298, simple_loss=0.1049, pruned_loss=0.01196, audio_tagging_loss=0.008565, over 16110.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09025, pruned_loss=0.01201, audio_tagging_loss=0.00874, over 2973068.89 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:51:03,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3932806.6666666665, ans=0.1 2023-11-29 10:51:14,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3932873.3333333335, ans=0.0 2023-11-29 10:51:31,113 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 589950 2023-11-29 10:51:44,475 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2023-11-29 10:51:48,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3933073.3333333335, ans=0.125 2023-11-29 10:51:50,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3933073.3333333335, ans=0.125 2023-11-29 10:51:59,209 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 800, loss[loss=0.06226, simple_loss=0.08922, pruned_loss=0.00998, audio_tagging_loss=0.00767, over 16237.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.09, pruned_loss=0.0118, audio_tagging_loss=0.008787, over 2985565.25 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:51:59,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3933140.0, ans=0.0 2023-11-29 10:52:32,230 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.757e+01 9.309e+01 9.917e+01 1.087e+02 1.372e+02, threshold=1.983e+02, percent-clipped=0.0 2023-11-29 10:52:33,557 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 590000 2023-11-29 10:52:40,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3933340.0, ans=0.125 2023-11-29 10:52:42,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3933340.0, ans=0.0 2023-11-29 10:53:01,567 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 850, loss[loss=0.06476, simple_loss=0.08827, pruned_loss=0.01199, audio_tagging_loss=0.008627, over 15704.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09025, pruned_loss=0.01189, audio_tagging_loss=0.008787, over 2996606.84 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:53:09,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3933473.3333333335, ans=0.0 2023-11-29 10:53:20,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3933540.0, ans=0.1 2023-11-29 10:53:34,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3933606.6666666665, ans=0.125 2023-11-29 10:53:35,663 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 590050 2023-11-29 10:53:37,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3933606.6666666665, ans=0.125 2023-11-29 10:53:59,552 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:54:00,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3933740.0, ans=0.0 2023-11-29 10:54:05,638 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 900, loss[loss=0.06758, simple_loss=0.0958, pruned_loss=0.01291, audio_tagging_loss=0.006769, over 15573.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08959, pruned_loss=0.01184, audio_tagging_loss=0.008868, over 3006030.99 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:54:23,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3933873.3333333335, ans=0.125 2023-11-29 10:54:31,821 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.14 vs. limit=15.0 2023-11-29 10:54:37,715 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.767e+01 9.157e+01 9.744e+01 1.021e+02 1.316e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-29 10:54:39,060 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 590100 2023-11-29 10:54:55,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3934073.3333333335, ans=0.2 2023-11-29 10:55:07,115 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 950, loss[loss=0.09236, simple_loss=0.1294, pruned_loss=0.02154, audio_tagging_loss=0.006122, over 16030.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08905, pruned_loss=0.01178, audio_tagging_loss=0.008835, over 3012801.14 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:55:07,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3934140.0, ans=0.125 2023-11-29 10:55:13,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3934140.0, ans=0.1 2023-11-29 10:55:34,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3934273.3333333335, ans=0.025 2023-11-29 10:55:39,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3934273.3333333335, ans=0.1 2023-11-29 10:55:41,651 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 590150 2023-11-29 10:56:09,333 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 1000, loss[loss=0.05649, simple_loss=0.07697, pruned_loss=0.008545, audio_tagging_loss=0.009461, over 14945.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08803, pruned_loss=0.01169, audio_tagging_loss=0.008723, over 3014494.90 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:56:14,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3934473.3333333335, ans=0.07 2023-11-29 10:56:33,599 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2023-11-29 10:56:37,487 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 10:56:40,928 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.541e+01 9.201e+01 9.754e+01 1.071e+02 1.435e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-29 10:56:42,269 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 590200 2023-11-29 10:56:47,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3934673.3333333335, ans=0.0 2023-11-29 10:57:12,263 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 1050, loss[loss=0.07472, simple_loss=0.1017, pruned_loss=0.01585, audio_tagging_loss=0.008011, over 15389.00 frames. ], tot_loss[loss=0.06431, simple_loss=0.08799, pruned_loss=0.01176, audio_tagging_loss=0.008556, over 3019505.10 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:57:33,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3934873.3333333335, ans=0.07 2023-11-29 10:57:35,099 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:57:38,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3934940.0, ans=0.0 2023-11-29 10:57:45,664 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 590250 2023-11-29 10:58:13,870 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 1100, loss[loss=0.06955, simple_loss=0.09837, pruned_loss=0.01005, audio_tagging_loss=0.01032, over 14926.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.088, pruned_loss=0.01183, audio_tagging_loss=0.0086, over 3028328.64 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:58:14,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3935140.0, ans=0.1 2023-11-29 10:58:15,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3935140.0, ans=0.125 2023-11-29 10:58:17,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3935140.0, ans=0.125 2023-11-29 10:58:19,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3935140.0, ans=0.07 2023-11-29 10:58:19,988 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 10:58:37,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3935273.3333333335, ans=0.125 2023-11-29 10:58:48,058 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.914e+01 9.269e+01 9.621e+01 1.031e+02 1.312e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 10:58:48,170 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 590300 2023-11-29 10:58:49,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3935273.3333333335, ans=0.0 2023-11-29 10:58:50,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3935340.0, ans=0.125 2023-11-29 10:58:58,184 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.22 vs. limit=15.0 2023-11-29 10:59:06,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3935406.6666666665, ans=0.04949747468305833 2023-11-29 10:59:16,241 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 1150, loss[loss=0.06783, simple_loss=0.0922, pruned_loss=0.01129, audio_tagging_loss=0.01044, over 14922.00 frames. ], tot_loss[loss=0.0644, simple_loss=0.08772, pruned_loss=0.01186, audio_tagging_loss=0.008681, over 3031790.52 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:59:42,266 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.93 vs. limit=15.0 2023-11-29 10:59:50,072 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 590350 2023-11-29 10:59:50,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2023-11-29 10:59:55,307 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.44 vs. limit=15.0 2023-11-29 11:00:08,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3935740.0, ans=0.1 2023-11-29 11:00:16,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3935740.0, ans=0.125 2023-11-29 11:00:18,788 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 1200, loss[loss=0.05307, simple_loss=0.06943, pruned_loss=0.007123, audio_tagging_loss=0.01123, over 16018.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08731, pruned_loss=0.01192, audio_tagging_loss=0.008711, over 3033422.00 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:00:20,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3935806.6666666665, ans=0.125 2023-11-29 11:00:26,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3935806.6666666665, ans=0.125 2023-11-29 11:00:29,348 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.08 vs. limit=12.0 2023-11-29 11:00:39,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3935873.3333333335, ans=0.1 2023-11-29 11:00:49,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3935940.0, ans=0.125 2023-11-29 11:00:51,864 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 590400 2023-11-29 11:00:52,913 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.975e+01 9.086e+01 9.906e+01 1.090e+02 1.794e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-29 11:01:02,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3936006.6666666665, ans=0.125 2023-11-29 11:01:05,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3936006.6666666665, ans=0.1 2023-11-29 11:01:08,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3936073.3333333335, ans=0.07 2023-11-29 11:01:21,002 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 1250, loss[loss=0.07625, simple_loss=0.1057, pruned_loss=0.01267, audio_tagging_loss=0.01075, over 15232.00 frames. ], tot_loss[loss=0.06406, simple_loss=0.08718, pruned_loss=0.01183, audio_tagging_loss=0.008635, over 3034222.15 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:01:26,723 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.80 vs. limit=15.0 2023-11-29 11:01:34,451 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2023-11-29 11:01:40,562 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:01:41,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3936206.6666666665, ans=0.1 2023-11-29 11:01:55,026 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 590450 2023-11-29 11:02:20,332 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.17 vs. limit=22.5 2023-11-29 11:02:22,033 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 1300, loss[loss=0.07173, simple_loss=0.1049, pruned_loss=0.01346, audio_tagging_loss=0.00579, over 14455.00 frames. ], tot_loss[loss=0.06405, simple_loss=0.08747, pruned_loss=0.01179, audio_tagging_loss=0.008534, over 3035561.14 frames. ], batch size: 52, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:02:22,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3936473.3333333335, ans=0.125 2023-11-29 11:02:40,135 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-11-29 11:02:54,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3936606.6666666665, ans=0.0 2023-11-29 11:02:55,413 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 590500 2023-11-29 11:02:56,420 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.747e+01 8.938e+01 9.408e+01 1.020e+02 1.519e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-29 11:03:01,653 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=22.5 2023-11-29 11:03:11,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3936740.0, ans=0.2 2023-11-29 11:03:11,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3936740.0, ans=0.0 2023-11-29 11:03:23,224 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 1350, loss[loss=0.07088, simple_loss=0.09384, pruned_loss=0.01515, audio_tagging_loss=0.008814, over 14883.00 frames. ], tot_loss[loss=0.06416, simple_loss=0.08784, pruned_loss=0.01171, audio_tagging_loss=0.008526, over 3033188.33 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:03:34,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3936806.6666666665, ans=0.2 2023-11-29 11:03:53,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3936940.0, ans=0.1 2023-11-29 11:03:56,217 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 590550 2023-11-29 11:04:05,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3937006.6666666665, ans=0.05 2023-11-29 11:04:08,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.88 vs. limit=15.0 2023-11-29 11:04:11,564 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:04:25,699 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 1400, loss[loss=0.06215, simple_loss=0.0899, pruned_loss=0.009225, audio_tagging_loss=0.00797, over 15114.00 frames. ], tot_loss[loss=0.06402, simple_loss=0.08765, pruned_loss=0.01166, audio_tagging_loss=0.008531, over 3040081.86 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:04:28,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3937140.0, ans=0.125 2023-11-29 11:04:37,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3937206.6666666665, ans=0.0 2023-11-29 11:04:43,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3937206.6666666665, ans=0.125 2023-11-29 11:04:46,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3937206.6666666665, ans=0.125 2023-11-29 11:04:49,262 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.46 vs. limit=15.0 2023-11-29 11:04:56,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3937273.3333333335, ans=0.125 2023-11-29 11:04:58,836 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 590600 2023-11-29 11:04:58,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3937273.3333333335, ans=0.125 2023-11-29 11:04:59,857 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.992e+01 9.669e+01 1.038e+02 1.341e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 11:05:02,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3937340.0, ans=0.125 2023-11-29 11:05:15,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3937406.6666666665, ans=0.0 2023-11-29 11:05:15,783 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2023-11-29 11:05:26,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3937473.3333333335, ans=0.1 2023-11-29 11:05:26,995 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 1450, loss[loss=0.07432, simple_loss=0.1015, pruned_loss=0.01255, audio_tagging_loss=0.01101, over 15545.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08934, pruned_loss=0.01181, audio_tagging_loss=0.008496, over 3042381.94 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:05:40,012 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.81 vs. limit=10.0 2023-11-29 11:05:40,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3937540.0, ans=0.2 2023-11-29 11:05:55,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3937606.6666666665, ans=0.125 2023-11-29 11:06:01,107 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 590650 2023-11-29 11:06:26,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3937740.0, ans=0.1 2023-11-29 11:06:28,792 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 1500, loss[loss=0.05065, simple_loss=0.06689, pruned_loss=0.006584, audio_tagging_loss=0.01062, over 16848.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08922, pruned_loss=0.01187, audio_tagging_loss=0.008561, over 3038254.90 frames. ], batch size: 64, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:06:33,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3937806.6666666665, ans=0.125 2023-11-29 11:06:33,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3937806.6666666665, ans=0.1 2023-11-29 11:06:35,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3937806.6666666665, ans=0.1 2023-11-29 11:06:45,097 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2023-11-29 11:07:02,008 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 590700 2023-11-29 11:07:02,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3937940.0, ans=0.125 2023-11-29 11:07:03,058 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 9.150e+01 9.903e+01 1.059e+02 1.485e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-29 11:07:07,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3938006.6666666665, ans=0.0 2023-11-29 11:07:12,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3938006.6666666665, ans=0.125 2023-11-29 11:07:19,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3938073.3333333335, ans=0.2 2023-11-29 11:07:24,241 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.22 vs. limit=15.0 2023-11-29 11:07:25,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3938073.3333333335, ans=0.125 2023-11-29 11:07:31,377 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 1550, loss[loss=0.06633, simple_loss=0.09972, pruned_loss=0.0101, audio_tagging_loss=0.006363, over 14774.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.0887, pruned_loss=0.01181, audio_tagging_loss=0.008645, over 3042741.99 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:07:36,755 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.63 vs. limit=22.5 2023-11-29 11:07:38,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3938140.0, ans=0.1 2023-11-29 11:07:44,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3938206.6666666665, ans=0.125 2023-11-29 11:07:57,116 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.59 vs. limit=10.0 2023-11-29 11:08:00,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3938273.3333333335, ans=0.125 2023-11-29 11:08:03,580 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 590750 2023-11-29 11:08:05,974 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.63 vs. limit=15.0 2023-11-29 11:08:32,431 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 1600, loss[loss=0.07481, simple_loss=0.107, pruned_loss=0.01246, audio_tagging_loss=0.008872, over 15514.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.0889, pruned_loss=0.01188, audio_tagging_loss=0.008745, over 3047075.08 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:08:40,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3938473.3333333335, ans=0.2 2023-11-29 11:08:44,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3938540.0, ans=0.0 2023-11-29 11:09:02,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3938606.6666666665, ans=0.1 2023-11-29 11:09:06,603 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 590800 2023-11-29 11:09:07,673 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.890e+01 8.907e+01 9.577e+01 1.022e+02 1.784e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-29 11:09:08,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3938606.6666666665, ans=0.125 2023-11-29 11:09:08,852 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2023-11-29 11:09:16,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3938673.3333333335, ans=0.2 2023-11-29 11:09:25,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3938740.0, ans=0.125 2023-11-29 11:09:34,412 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 1650, loss[loss=0.05599, simple_loss=0.07321, pruned_loss=0.009593, audio_tagging_loss=0.009788, over 15488.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08856, pruned_loss=0.0118, audio_tagging_loss=0.008723, over 3042823.81 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:09:42,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3938806.6666666665, ans=0.0 2023-11-29 11:09:54,567 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=15.0 2023-11-29 11:09:57,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3938873.3333333335, ans=0.125 2023-11-29 11:10:01,643 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=22.5 2023-11-29 11:10:08,155 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 590850 2023-11-29 11:10:17,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3939006.6666666665, ans=0.1 2023-11-29 11:10:18,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3939006.6666666665, ans=0.1 2023-11-29 11:10:25,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3939073.3333333335, ans=0.125 2023-11-29 11:10:31,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3939073.3333333335, ans=0.125 2023-11-29 11:10:36,646 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 1700, loss[loss=0.06567, simple_loss=0.08809, pruned_loss=0.01267, audio_tagging_loss=0.00896, over 15156.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08789, pruned_loss=0.01166, audio_tagging_loss=0.008847, over 3034718.42 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:10:37,359 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.84 vs. limit=15.0 2023-11-29 11:11:09,734 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 590900 2023-11-29 11:11:10,735 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 9.146e+01 9.599e+01 1.028e+02 1.355e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 11:11:38,440 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 1750, loss[loss=0.06899, simple_loss=0.1014, pruned_loss=0.01102, audio_tagging_loss=0.007249, over 15376.00 frames. ], tot_loss[loss=0.06438, simple_loss=0.08778, pruned_loss=0.01175, audio_tagging_loss=0.008735, over 3030935.70 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:11:39,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3939473.3333333335, ans=0.2 2023-11-29 11:11:47,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3939473.3333333335, ans=0.09899494936611666 2023-11-29 11:11:49,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3939473.3333333335, ans=15.0 2023-11-29 11:11:52,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3939540.0, ans=0.0 2023-11-29 11:12:12,007 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 590950 2023-11-29 11:12:40,148 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 1800, loss[loss=0.05323, simple_loss=0.06832, pruned_loss=0.009363, audio_tagging_loss=0.009711, over 15558.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08783, pruned_loss=0.01178, audio_tagging_loss=0.008662, over 3031047.70 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:12:45,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3939806.6666666665, ans=0.0 2023-11-29 11:13:13,697 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 591000 2023-11-29 11:13:14,668 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.761e+01 9.256e+01 9.797e+01 1.069e+02 1.253e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-29 11:13:23,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3940006.6666666665, ans=0.125 2023-11-29 11:13:24,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3940006.6666666665, ans=0.0 2023-11-29 11:13:42,529 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 1850, loss[loss=0.06574, simple_loss=0.09676, pruned_loss=0.01102, audio_tagging_loss=0.006342, over 15724.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08822, pruned_loss=0.01196, audio_tagging_loss=0.008545, over 3039157.96 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:13:51,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3940140.0, ans=0.1 2023-11-29 11:14:15,340 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 591050 2023-11-29 11:14:43,515 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 1900, loss[loss=0.05744, simple_loss=0.07584, pruned_loss=0.009534, audio_tagging_loss=0.009986, over 14613.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08853, pruned_loss=0.01174, audio_tagging_loss=0.00858, over 3041317.12 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:14:50,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3940473.3333333335, ans=0.1 2023-11-29 11:14:56,592 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.64 vs. limit=15.0 2023-11-29 11:15:17,992 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 591100 2023-11-29 11:15:19,022 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.739e+01 9.784e+01 1.081e+02 1.359e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-29 11:15:21,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3940673.3333333335, ans=0.1 2023-11-29 11:15:21,957 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2023-11-29 11:15:23,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3940673.3333333335, ans=0.1 2023-11-29 11:15:46,063 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 1950, loss[loss=0.05829, simple_loss=0.08452, pruned_loss=0.007405, audio_tagging_loss=0.008627, over 14134.00 frames. ], tot_loss[loss=0.06413, simple_loss=0.08835, pruned_loss=0.01152, audio_tagging_loss=0.008434, over 3038934.55 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:15:47,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3940806.6666666665, ans=0.1 2023-11-29 11:15:53,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3940806.6666666665, ans=0.125 2023-11-29 11:15:55,382 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:16:02,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.88 vs. limit=15.0 2023-11-29 11:16:11,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3940940.0, ans=0.0 2023-11-29 11:16:18,618 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 591150 2023-11-29 11:16:38,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3941073.3333333335, ans=0.0 2023-11-29 11:16:41,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3941073.3333333335, ans=0.125 2023-11-29 11:16:48,014 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 2000, loss[loss=0.07012, simple_loss=0.09728, pruned_loss=0.0172, audio_tagging_loss=0.004276, over 14804.00 frames. ], tot_loss[loss=0.06386, simple_loss=0.08778, pruned_loss=0.01154, audio_tagging_loss=0.008421, over 3033076.88 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:17:03,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3941206.6666666665, ans=0.0 2023-11-29 11:17:07,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3941206.6666666665, ans=0.125 2023-11-29 11:17:20,876 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 591200 2023-11-29 11:17:21,866 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.802e+01 9.211e+01 9.826e+01 1.048e+02 3.263e+02, threshold=1.965e+02, percent-clipped=1.0 2023-11-29 11:17:26,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3941340.0, ans=0.125 2023-11-29 11:17:34,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3941340.0, ans=0.125 2023-11-29 11:17:41,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3941406.6666666665, ans=0.025 2023-11-29 11:17:49,462 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 2050, loss[loss=0.06379, simple_loss=0.08874, pruned_loss=0.01287, audio_tagging_loss=0.006546, over 15089.00 frames. ], tot_loss[loss=0.06415, simple_loss=0.08781, pruned_loss=0.01181, audio_tagging_loss=0.008429, over 3031563.73 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:17:53,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3941473.3333333335, ans=0.0 2023-11-29 11:18:00,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3941473.3333333335, ans=0.2 2023-11-29 11:18:17,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3941606.6666666665, ans=0.125 2023-11-29 11:18:17,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3941606.6666666665, ans=0.0 2023-11-29 11:18:24,885 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 591250 2023-11-29 11:18:42,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3941740.0, ans=0.125 2023-11-29 11:18:53,370 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 2100, loss[loss=0.05353, simple_loss=0.07638, pruned_loss=0.005631, audio_tagging_loss=0.009705, over 15648.00 frames. ], tot_loss[loss=0.06396, simple_loss=0.08751, pruned_loss=0.01177, audio_tagging_loss=0.008438, over 3030564.94 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:18:54,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3941806.6666666665, ans=0.125 2023-11-29 11:19:22,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3941940.0, ans=0.125 2023-11-29 11:19:23,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3941940.0, ans=0.125 2023-11-29 11:19:26,520 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 591300 2023-11-29 11:19:27,590 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 9.036e+01 9.652e+01 1.066e+02 1.265e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 11:19:34,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3942006.6666666665, ans=0.035 2023-11-29 11:19:36,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3942006.6666666665, ans=0.0 2023-11-29 11:19:46,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3942073.3333333335, ans=0.2 2023-11-29 11:19:55,528 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 2150, loss[loss=0.06072, simple_loss=0.08174, pruned_loss=0.01096, audio_tagging_loss=0.008879, over 14387.00 frames. ], tot_loss[loss=0.06413, simple_loss=0.08773, pruned_loss=0.01182, audio_tagging_loss=0.008447, over 3042700.78 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:19:57,775 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.21 vs. limit=15.0 2023-11-29 11:19:57,903 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2023-11-29 11:20:28,664 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 591350 2023-11-29 11:20:34,452 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:20:39,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3942340.0, ans=0.125 2023-11-29 11:20:56,521 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 2200, loss[loss=0.05512, simple_loss=0.07721, pruned_loss=0.006887, audio_tagging_loss=0.00963, over 17115.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08859, pruned_loss=0.01189, audio_tagging_loss=0.008463, over 3043066.78 frames. ], batch size: 66, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:21:02,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3942473.3333333335, ans=0.05 2023-11-29 11:21:13,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3942540.0, ans=0.125 2023-11-29 11:21:30,797 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 591400 2023-11-29 11:21:33,266 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.050e+01 9.112e+01 9.556e+01 1.057e+02 1.343e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-29 11:21:41,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3942673.3333333335, ans=0.125 2023-11-29 11:21:46,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3942740.0, ans=0.125 2023-11-29 11:21:58,268 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 2250, loss[loss=0.0863, simple_loss=0.1251, pruned_loss=0.01748, audio_tagging_loss=0.006265, over 15614.00 frames. ], tot_loss[loss=0.06422, simple_loss=0.08807, pruned_loss=0.01173, audio_tagging_loss=0.008456, over 3039523.80 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:22:32,838 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 591450 2023-11-29 11:22:36,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3943006.6666666665, ans=0.0 2023-11-29 11:22:43,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3943006.6666666665, ans=0.0 2023-11-29 11:22:56,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3943073.3333333335, ans=0.0 2023-11-29 11:23:01,176 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 2300, loss[loss=0.05892, simple_loss=0.0841, pruned_loss=0.008862, audio_tagging_loss=0.008003, over 15464.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08835, pruned_loss=0.01182, audio_tagging_loss=0.008522, over 3035762.98 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:23:12,138 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2023-11-29 11:23:28,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3943273.3333333335, ans=0.125 2023-11-29 11:23:33,711 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 591500 2023-11-29 11:23:36,393 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.869e+01 9.045e+01 9.649e+01 1.036e+02 1.193e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 11:23:37,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.81 vs. limit=15.0 2023-11-29 11:23:45,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3943340.0, ans=0.125 2023-11-29 11:23:46,133 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2023-11-29 11:23:58,503 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:24:03,239 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 2350, loss[loss=0.05689, simple_loss=0.07333, pruned_loss=0.0104, audio_tagging_loss=0.009825, over 14742.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08878, pruned_loss=0.01184, audio_tagging_loss=0.008589, over 3035413.92 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:24:19,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3943540.0, ans=0.125 2023-11-29 11:24:28,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3943606.6666666665, ans=0.2 2023-11-29 11:24:36,646 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 591550 2023-11-29 11:24:40,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.25 vs. limit=12.0 2023-11-29 11:24:51,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3943740.0, ans=0.125 2023-11-29 11:25:04,305 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 2400, loss[loss=0.07456, simple_loss=0.1089, pruned_loss=0.01475, audio_tagging_loss=0.005348, over 15181.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08945, pruned_loss=0.01197, audio_tagging_loss=0.008606, over 3038645.75 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:25:09,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3943806.6666666665, ans=0.0 2023-11-29 11:25:34,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3943940.0, ans=0.1 2023-11-29 11:25:38,241 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 591600 2023-11-29 11:25:40,824 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.183e+01 9.372e+01 9.981e+01 1.068e+02 1.267e+02, threshold=1.996e+02, percent-clipped=0.0 2023-11-29 11:25:50,388 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:25:50,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3944006.6666666665, ans=0.1 2023-11-29 11:25:56,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3944073.3333333335, ans=0.0 2023-11-29 11:26:02,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3944073.3333333335, ans=0.0 2023-11-29 11:26:03,128 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-11-29 11:26:06,096 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 2450, loss[loss=0.06235, simple_loss=0.08029, pruned_loss=0.01238, audio_tagging_loss=0.009828, over 14141.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08816, pruned_loss=0.01163, audio_tagging_loss=0.008665, over 3033394.12 frames. ], batch size: 52, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:26:09,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3944140.0, ans=0.125 2023-11-29 11:26:14,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3944140.0, ans=0.1 2023-11-29 11:26:39,245 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 591650 2023-11-29 11:26:41,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3944340.0, ans=0.0 2023-11-29 11:26:44,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3944340.0, ans=0.0 2023-11-29 11:27:05,664 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.60 vs. limit=15.0 2023-11-29 11:27:08,544 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 2500, loss[loss=0.07083, simple_loss=0.09898, pruned_loss=0.01152, audio_tagging_loss=0.009822, over 14816.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08822, pruned_loss=0.01168, audio_tagging_loss=0.008722, over 3033702.21 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:27:20,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3944540.0, ans=0.0 2023-11-29 11:27:35,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3944606.6666666665, ans=0.125 2023-11-29 11:27:36,836 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=12.0 2023-11-29 11:27:40,832 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 591700 2023-11-29 11:27:44,874 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 9.155e+01 9.688e+01 1.051e+02 1.449e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 11:27:45,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3944673.3333333335, ans=0.125 2023-11-29 11:27:52,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3944673.3333333335, ans=0.0 2023-11-29 11:28:08,715 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 2550, loss[loss=0.06571, simple_loss=0.0939, pruned_loss=0.01201, audio_tagging_loss=0.006751, over 14670.00 frames. ], tot_loss[loss=0.06388, simple_loss=0.08749, pruned_loss=0.01147, audio_tagging_loss=0.008666, over 3036760.79 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:28:10,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3944806.6666666665, ans=0.1 2023-11-29 11:28:13,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3944806.6666666665, ans=0.0 2023-11-29 11:28:17,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3944806.6666666665, ans=0.0 2023-11-29 11:28:42,584 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 591750 2023-11-29 11:28:45,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3945006.6666666665, ans=0.05 2023-11-29 11:28:58,479 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.46 vs. limit=22.5 2023-11-29 11:29:10,135 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 2600, loss[loss=0.0741, simple_loss=0.1058, pruned_loss=0.01198, audio_tagging_loss=0.009222, over 15633.00 frames. ], tot_loss[loss=0.06385, simple_loss=0.08784, pruned_loss=0.0115, audio_tagging_loss=0.008428, over 3035622.06 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:29:12,824 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:29:24,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3945206.6666666665, ans=0.0 2023-11-29 11:29:43,989 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 591800 2023-11-29 11:29:47,799 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 8.895e+01 9.478e+01 1.021e+02 1.360e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-29 11:30:05,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3945406.6666666665, ans=0.04949747468305833 2023-11-29 11:30:13,352 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 2650, loss[loss=0.07177, simple_loss=0.1047, pruned_loss=0.01339, audio_tagging_loss=0.006033, over 15433.00 frames. ], tot_loss[loss=0.06409, simple_loss=0.08827, pruned_loss=0.0116, audio_tagging_loss=0.008357, over 3041120.48 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:30:28,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3945540.0, ans=0.125 2023-11-29 11:30:45,929 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 591850 2023-11-29 11:31:12,007 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.63 vs. limit=15.0 2023-11-29 11:31:14,820 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 2700, loss[loss=0.05184, simple_loss=0.0732, pruned_loss=0.006355, audio_tagging_loss=0.008888, over 14729.00 frames. ], tot_loss[loss=0.06414, simple_loss=0.08809, pruned_loss=0.0117, audio_tagging_loss=0.008394, over 3052599.69 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:31:15,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3945806.6666666665, ans=0.05 2023-11-29 11:31:15,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3945806.6666666665, ans=0.125 2023-11-29 11:31:23,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3945806.6666666665, ans=0.125 2023-11-29 11:31:38,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3945940.0, ans=0.125 2023-11-29 11:31:42,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3945940.0, ans=0.0 2023-11-29 11:31:49,106 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 591900 2023-11-29 11:31:51,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3946006.6666666665, ans=0.0 2023-11-29 11:31:53,632 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.084e+01 9.168e+01 9.953e+01 1.095e+02 1.462e+02, threshold=1.991e+02, percent-clipped=0.0 2023-11-29 11:31:58,889 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2023-11-29 11:32:08,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3946073.3333333335, ans=0.0 2023-11-29 11:32:11,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.19 vs. limit=15.0 2023-11-29 11:32:16,477 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 2750, loss[loss=0.07033, simple_loss=0.09794, pruned_loss=0.01465, audio_tagging_loss=0.006714, over 15835.00 frames. ], tot_loss[loss=0.06417, simple_loss=0.08807, pruned_loss=0.01174, audio_tagging_loss=0.008397, over 3045558.74 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 8.0 2023-11-29 11:32:32,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3946206.6666666665, ans=0.1 2023-11-29 11:32:32,624 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.64 vs. limit=22.5 2023-11-29 11:32:33,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3946206.6666666665, ans=0.2 2023-11-29 11:32:49,794 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 591950 2023-11-29 11:32:49,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3946273.3333333335, ans=0.0 2023-11-29 11:32:50,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3946273.3333333335, ans=10.0 2023-11-29 11:32:52,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3946340.0, ans=0.125 2023-11-29 11:33:06,057 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=15.0 2023-11-29 11:33:10,026 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:33:18,143 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 2800, loss[loss=0.08547, simple_loss=0.1174, pruned_loss=0.01957, audio_tagging_loss=0.007206, over 14784.00 frames. ], tot_loss[loss=0.0643, simple_loss=0.08807, pruned_loss=0.01183, audio_tagging_loss=0.00843, over 3046301.67 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:33:28,866 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=15.0 2023-11-29 11:33:32,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3946540.0, ans=0.125 2023-11-29 11:33:32,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=22.5 2023-11-29 11:33:33,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3946540.0, ans=0.0 2023-11-29 11:33:34,641 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.61 vs. limit=15.0 2023-11-29 11:33:49,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3946606.6666666665, ans=0.2 2023-11-29 11:33:51,410 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 592000 2023-11-29 11:33:52,892 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-592000.pt 2023-11-29 11:33:58,930 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.825e+01 9.130e+01 9.870e+01 1.066e+02 1.963e+02, threshold=1.974e+02, percent-clipped=0.0 2023-11-29 11:34:11,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3946740.0, ans=0.125 2023-11-29 11:34:12,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3946740.0, ans=0.1 2023-11-29 11:34:12,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.87 vs. limit=15.0 2023-11-29 11:34:22,986 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 2850, loss[loss=0.06605, simple_loss=0.0924, pruned_loss=0.0117, audio_tagging_loss=0.008145, over 14546.00 frames. ], tot_loss[loss=0.06364, simple_loss=0.08713, pruned_loss=0.01162, audio_tagging_loss=0.00846, over 3049713.99 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:34:25,906 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.83 vs. limit=22.5 2023-11-29 11:34:39,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=3946873.3333333335, ans=0.1 2023-11-29 11:34:56,231 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 592050 2023-11-29 11:35:21,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3947073.3333333335, ans=0.0 2023-11-29 11:35:21,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3947073.3333333335, ans=0.1 2023-11-29 11:35:24,276 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 2900, loss[loss=0.06151, simple_loss=0.0797, pruned_loss=0.01408, audio_tagging_loss=0.007572, over 15640.00 frames. ], tot_loss[loss=0.06394, simple_loss=0.08756, pruned_loss=0.01172, audio_tagging_loss=0.008439, over 3039116.51 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:35:24,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3947140.0, ans=0.09899494936611666 2023-11-29 11:35:37,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3947206.6666666665, ans=0.0 2023-11-29 11:35:58,315 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 592100 2023-11-29 11:36:02,772 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 9.122e+01 9.763e+01 1.061e+02 1.440e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-29 11:36:05,895 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2023-11-29 11:36:15,763 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.18 vs. limit=15.0 2023-11-29 11:36:24,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3947406.6666666665, ans=0.125 2023-11-29 11:36:26,616 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 2950, loss[loss=0.0606, simple_loss=0.07776, pruned_loss=0.01274, audio_tagging_loss=0.008987, over 15287.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08841, pruned_loss=0.0118, audio_tagging_loss=0.008476, over 3046292.21 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:36:33,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3947473.3333333335, ans=0.0 2023-11-29 11:36:45,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3947540.0, ans=0.1 2023-11-29 11:36:59,533 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 592150 2023-11-29 11:37:01,096 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:37:06,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3947673.3333333335, ans=0.0 2023-11-29 11:37:08,224 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.12 vs. limit=15.0 2023-11-29 11:37:12,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3947673.3333333335, ans=0.2 2023-11-29 11:37:27,934 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 3000, loss[loss=0.0657, simple_loss=0.09244, pruned_loss=0.01252, audio_tagging_loss=0.006964, over 15571.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.0888, pruned_loss=0.01187, audio_tagging_loss=0.008505, over 3039244.09 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:37:27,936 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-29 11:38:02,863 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.5966, 3.7444, 3.9938, 3.5034], device='cuda:0') 2023-11-29 11:38:07,434 INFO [train_asr.py:1267] (0/4) Epoch 50, validation: loss=0.05782, simple_loss=0.05046, pruned_loss=0.005473, audio_tagging_loss=0.02712, over 4681554.00 frames. 2023-11-29 11:38:07,435 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-29 11:38:20,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3947873.3333333335, ans=0.125 2023-11-29 11:38:40,745 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 592200 2023-11-29 11:38:45,631 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.881e+01 9.188e+01 9.766e+01 1.056e+02 1.297e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-29 11:38:46,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3948006.6666666665, ans=0.125 2023-11-29 11:38:59,663 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.66 vs. limit=15.0 2023-11-29 11:39:00,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3948073.3333333335, ans=0.2 2023-11-29 11:39:09,601 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 3050, loss[loss=0.0756, simple_loss=0.11, pruned_loss=0.01263, audio_tagging_loss=0.007983, over 15352.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08831, pruned_loss=0.01175, audio_tagging_loss=0.008563, over 3042910.49 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:39:12,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3948140.0, ans=0.1 2023-11-29 11:39:20,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3948206.6666666665, ans=0.125 2023-11-29 11:39:32,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3948273.3333333335, ans=0.0 2023-11-29 11:39:42,124 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 592250 2023-11-29 11:39:45,501 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:39:46,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3948340.0, ans=0.125 2023-11-29 11:39:47,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3948340.0, ans=0.125 2023-11-29 11:40:03,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=3948406.6666666665, ans=0.1 2023-11-29 11:40:11,008 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 3100, loss[loss=0.05654, simple_loss=0.07842, pruned_loss=0.009278, audio_tagging_loss=0.008054, over 15320.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08889, pruned_loss=0.01177, audio_tagging_loss=0.00854, over 3044440.58 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:40:13,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3948473.3333333335, ans=0.09899494936611666 2023-11-29 11:40:25,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3948540.0, ans=0.0 2023-11-29 11:40:27,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3948540.0, ans=0.1 2023-11-29 11:40:35,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3948606.6666666665, ans=0.2 2023-11-29 11:40:38,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3948606.6666666665, ans=0.1 2023-11-29 11:40:43,947 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 592300 2023-11-29 11:40:48,597 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.753e+01 9.181e+01 9.927e+01 1.074e+02 1.864e+02, threshold=1.985e+02, percent-clipped=0.0 2023-11-29 11:40:49,342 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.64 vs. limit=22.5 2023-11-29 11:41:08,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3948740.0, ans=0.2 2023-11-29 11:41:08,344 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:41:12,055 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 3150, loss[loss=0.07492, simple_loss=0.1105, pruned_loss=0.01219, audio_tagging_loss=0.007464, over 15260.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08982, pruned_loss=0.01197, audio_tagging_loss=0.008573, over 3047964.09 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:41:22,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3948873.3333333335, ans=0.0 2023-11-29 11:41:31,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3948873.3333333335, ans=0.125 2023-11-29 11:41:45,123 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 592350 2023-11-29 11:41:49,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3949006.6666666665, ans=0.5 2023-11-29 11:42:07,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3949073.3333333335, ans=0.125 2023-11-29 11:42:12,846 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 3200, loss[loss=0.05352, simple_loss=0.06147, pruned_loss=0.01314, audio_tagging_loss=0.009641, over 14894.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08886, pruned_loss=0.01172, audio_tagging_loss=0.008753, over 3045686.15 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:42:13,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3949140.0, ans=0.125 2023-11-29 11:42:15,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3949140.0, ans=0.0 2023-11-29 11:42:35,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3949206.6666666665, ans=0.0 2023-11-29 11:42:39,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3949273.3333333335, ans=0.0 2023-11-29 11:42:40,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=22.5 2023-11-29 11:42:46,807 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 592400 2023-11-29 11:42:52,065 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 8.966e+01 9.651e+01 1.039e+02 1.549e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 11:43:04,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3949406.6666666665, ans=0.125 2023-11-29 11:43:15,831 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 3250, loss[loss=0.05402, simple_loss=0.06973, pruned_loss=0.007379, audio_tagging_loss=0.01178, over 15329.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08841, pruned_loss=0.0116, audio_tagging_loss=0.008844, over 3042443.31 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:43:25,544 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.40 vs. limit=15.0 2023-11-29 11:43:32,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3949540.0, ans=0.125 2023-11-29 11:43:36,532 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.26 vs. limit=10.0 2023-11-29 11:43:39,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3949606.6666666665, ans=0.125 2023-11-29 11:43:49,433 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 592450 2023-11-29 11:44:06,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=3949740.0, ans=0.2 2023-11-29 11:44:17,921 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 3300, loss[loss=0.04493, simple_loss=0.05849, pruned_loss=0.007161, audio_tagging_loss=0.008525, over 14560.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08839, pruned_loss=0.01157, audio_tagging_loss=0.00891, over 3042870.58 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:44:37,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3949873.3333333335, ans=0.1 2023-11-29 11:44:39,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3949873.3333333335, ans=0.125 2023-11-29 11:44:48,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3949940.0, ans=0.1 2023-11-29 11:44:51,482 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 592500 2023-11-29 11:44:56,137 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.509e+01 9.045e+01 9.733e+01 1.044e+02 1.292e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-29 11:45:05,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.65 vs. limit=10.0 2023-11-29 11:45:10,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3950073.3333333335, ans=0.0 2023-11-29 11:45:20,726 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 3350, loss[loss=0.07091, simple_loss=0.1003, pruned_loss=0.0137, audio_tagging_loss=0.007081, over 15104.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08855, pruned_loss=0.01167, audio_tagging_loss=0.008757, over 3043406.01 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:45:22,673 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.65 vs. limit=10.0 2023-11-29 11:45:42,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3950206.6666666665, ans=0.5 2023-11-29 11:45:53,676 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 592550 2023-11-29 11:45:56,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3950340.0, ans=0.1 2023-11-29 11:46:10,181 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=15.0 2023-11-29 11:46:22,698 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 3400, loss[loss=0.07488, simple_loss=0.1092, pruned_loss=0.01535, audio_tagging_loss=0.004926, over 16508.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08924, pruned_loss=0.01187, audio_tagging_loss=0.008551, over 3042933.76 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:46:45,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3950540.0, ans=0.1 2023-11-29 11:46:50,542 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.48 vs. limit=12.0 2023-11-29 11:46:56,902 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 592600 2023-11-29 11:47:01,893 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.883e+01 9.007e+01 9.772e+01 1.033e+02 1.333e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 11:47:15,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3950740.0, ans=0.07 2023-11-29 11:47:24,649 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 3450, loss[loss=0.05823, simple_loss=0.08466, pruned_loss=0.007878, audio_tagging_loss=0.008018, over 14420.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08885, pruned_loss=0.01189, audio_tagging_loss=0.008494, over 3042871.06 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:47:33,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3950806.6666666665, ans=0.07 2023-11-29 11:47:37,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3950873.3333333335, ans=0.125 2023-11-29 11:47:44,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3950873.3333333335, ans=0.2 2023-11-29 11:47:58,827 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 592650 2023-11-29 11:48:07,784 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.62 vs. limit=15.0 2023-11-29 11:48:25,481 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2023-11-29 11:48:27,053 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 3500, loss[loss=0.06475, simple_loss=0.09231, pruned_loss=0.009658, audio_tagging_loss=0.008933, over 14966.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.0886, pruned_loss=0.01183, audio_tagging_loss=0.008456, over 3042319.26 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:48:40,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3951206.6666666665, ans=0.0 2023-11-29 11:48:58,867 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:48:59,686 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.38 vs. limit=15.0 2023-11-29 11:49:00,142 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 592700 2023-11-29 11:49:05,881 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.850e+01 9.064e+01 9.893e+01 1.052e+02 1.385e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 11:49:11,384 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=22.5 2023-11-29 11:49:14,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3951340.0, ans=0.1 2023-11-29 11:49:29,241 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 3550, loss[loss=0.06612, simple_loss=0.08455, pruned_loss=0.01343, audio_tagging_loss=0.01042, over 14532.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.0887, pruned_loss=0.01179, audio_tagging_loss=0.008412, over 3042056.92 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:49:34,652 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-29 11:49:53,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3951606.6666666665, ans=0.125 2023-11-29 11:50:02,837 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 592750 2023-11-29 11:50:04,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3951606.6666666665, ans=0.125 2023-11-29 11:50:07,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3951673.3333333335, ans=0.0 2023-11-29 11:50:30,394 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 3600, loss[loss=0.04476, simple_loss=0.05559, pruned_loss=0.005118, audio_tagging_loss=0.01185, over 14924.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08878, pruned_loss=0.01181, audio_tagging_loss=0.008366, over 3044319.24 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:50:38,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3951806.6666666665, ans=0.125 2023-11-29 11:50:52,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-11-29 11:51:01,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3951940.0, ans=0.0 2023-11-29 11:51:04,651 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 592800 2023-11-29 11:51:09,531 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 9.101e+01 9.681e+01 1.023e+02 1.277e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-29 11:51:21,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3952073.3333333335, ans=0.025 2023-11-29 11:51:27,727 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:51:31,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3952073.3333333335, ans=0.125 2023-11-29 11:51:33,135 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 3650, loss[loss=0.0611, simple_loss=0.08213, pruned_loss=0.01413, audio_tagging_loss=0.005904, over 14288.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08929, pruned_loss=0.01198, audio_tagging_loss=0.008311, over 3050698.76 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:51:34,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3952140.0, ans=0.2 2023-11-29 11:52:05,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3952273.3333333335, ans=0.0 2023-11-29 11:52:06,087 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 592850 2023-11-29 11:52:09,012 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.37 vs. limit=15.0 2023-11-29 11:52:19,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3952340.0, ans=0.2 2023-11-29 11:52:25,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3952406.6666666665, ans=0.125 2023-11-29 11:52:31,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3952406.6666666665, ans=0.125 2023-11-29 11:52:35,280 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 3700, loss[loss=0.05787, simple_loss=0.07548, pruned_loss=0.008324, audio_tagging_loss=0.01181, over 14868.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.0888, pruned_loss=0.01188, audio_tagging_loss=0.008303, over 3048191.11 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:52:36,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3952473.3333333335, ans=0.125 2023-11-29 11:52:38,366 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.84 vs. limit=22.5 2023-11-29 11:52:52,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3952540.0, ans=0.0 2023-11-29 11:53:08,682 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 592900 2023-11-29 11:53:14,450 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.712e+01 9.192e+01 9.964e+01 1.058e+02 1.278e+02, threshold=1.993e+02, percent-clipped=0.0 2023-11-29 11:53:16,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3952673.3333333335, ans=0.1 2023-11-29 11:53:32,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3952740.0, ans=0.1 2023-11-29 11:53:35,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3952806.6666666665, ans=0.0 2023-11-29 11:53:36,612 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 3750, loss[loss=0.06719, simple_loss=0.09454, pruned_loss=0.01224, audio_tagging_loss=0.007684, over 14696.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08868, pruned_loss=0.01181, audio_tagging_loss=0.008385, over 3043121.32 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:53:51,845 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.37 vs. limit=15.0 2023-11-29 11:53:53,948 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2023-11-29 11:54:05,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2023-11-29 11:54:10,851 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 592950 2023-11-29 11:54:11,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3952940.0, ans=0.2 2023-11-29 11:54:20,536 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:54:25,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3953073.3333333335, ans=0.1 2023-11-29 11:54:32,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3953073.3333333335, ans=0.1 2023-11-29 11:54:35,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3953073.3333333335, ans=0.125 2023-11-29 11:54:38,448 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 3800, loss[loss=0.06979, simple_loss=0.1012, pruned_loss=0.01142, audio_tagging_loss=0.007768, over 15391.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08979, pruned_loss=0.01194, audio_tagging_loss=0.008343, over 3038766.34 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:54:47,088 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.97 vs. limit=22.5 2023-11-29 11:54:58,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3953206.6666666665, ans=0.0 2023-11-29 11:54:58,360 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=15.0 2023-11-29 11:55:05,731 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.60 vs. limit=15.0 2023-11-29 11:55:12,114 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 593000 2023-11-29 11:55:18,273 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 9.089e+01 9.885e+01 1.067e+02 1.488e+02, threshold=1.977e+02, percent-clipped=0.0 2023-11-29 11:55:21,903 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.65 vs. limit=15.0 2023-11-29 11:55:41,788 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 3850, loss[loss=0.07368, simple_loss=0.09613, pruned_loss=0.01308, audio_tagging_loss=0.01253, over 15211.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08993, pruned_loss=0.01182, audio_tagging_loss=0.008484, over 3039094.93 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:55:45,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3953473.3333333335, ans=0.125 2023-11-29 11:55:48,547 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-11-29 11:55:52,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3953540.0, ans=0.1 2023-11-29 11:55:57,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3953540.0, ans=0.125 2023-11-29 11:56:07,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.28 vs. limit=22.5 2023-11-29 11:56:14,589 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 593050 2023-11-29 11:56:40,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=3953740.0, ans=15.0 2023-11-29 11:56:43,366 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 3900, loss[loss=0.0577, simple_loss=0.07539, pruned_loss=0.01002, audio_tagging_loss=0.009982, over 14772.00 frames. ], tot_loss[loss=0.06433, simple_loss=0.08835, pruned_loss=0.0116, audio_tagging_loss=0.008557, over 3033854.30 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:56:48,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3953806.6666666665, ans=0.0 2023-11-29 11:56:54,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3953873.3333333335, ans=0.125 2023-11-29 11:57:03,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3953873.3333333335, ans=0.125 2023-11-29 11:57:17,535 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 593100 2023-11-29 11:57:23,342 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.808e+01 8.927e+01 9.561e+01 1.012e+02 1.625e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-29 11:57:45,115 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 3950, loss[loss=0.06423, simple_loss=0.09088, pruned_loss=0.01074, audio_tagging_loss=0.008049, over 16028.00 frames. ], tot_loss[loss=0.06426, simple_loss=0.08803, pruned_loss=0.01154, audio_tagging_loss=0.008698, over 3042155.60 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:58:12,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3954273.3333333335, ans=0.125 2023-11-29 11:58:18,367 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 593150 2023-11-29 11:58:19,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3954273.3333333335, ans=0.0 2023-11-29 11:58:25,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3954340.0, ans=0.07 2023-11-29 11:58:42,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3954406.6666666665, ans=0.125 2023-11-29 11:58:47,962 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 4000, loss[loss=0.06432, simple_loss=0.09277, pruned_loss=0.009513, audio_tagging_loss=0.008423, over 15566.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.0883, pruned_loss=0.01164, audio_tagging_loss=0.008646, over 3046442.02 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:59:04,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3954540.0, ans=0.2 2023-11-29 11:59:06,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3954540.0, ans=0.1 2023-11-29 11:59:07,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.06 vs. limit=15.0 2023-11-29 11:59:19,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3954606.6666666665, ans=0.1 2023-11-29 11:59:19,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3954606.6666666665, ans=0.125 2023-11-29 11:59:20,431 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 593200 2023-11-29 11:59:26,598 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.870e+01 8.877e+01 9.527e+01 1.031e+02 1.352e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-29 11:59:47,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3954740.0, ans=0.0 2023-11-29 11:59:49,343 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 4050, loss[loss=0.05376, simple_loss=0.06473, pruned_loss=0.01194, audio_tagging_loss=0.009452, over 15055.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.0897, pruned_loss=0.01185, audio_tagging_loss=0.008659, over 3046681.80 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:59:54,012 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 12:00:02,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3954873.3333333335, ans=0.1 2023-11-29 12:00:04,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3954873.3333333335, ans=0.125 2023-11-29 12:00:04,839 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2023-11-29 12:00:07,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3954873.3333333335, ans=0.125 2023-11-29 12:00:16,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3954940.0, ans=0.2 2023-11-29 12:00:22,991 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 593250 2023-11-29 12:00:30,038 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.76 vs. limit=6.0 2023-11-29 12:00:49,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3955140.0, ans=0.125 2023-11-29 12:00:51,375 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 4100, loss[loss=0.07062, simple_loss=0.1013, pruned_loss=0.01396, audio_tagging_loss=0.006015, over 15017.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.09018, pruned_loss=0.01186, audio_tagging_loss=0.008692, over 3044356.73 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:00:56,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3955140.0, ans=0.125 2023-11-29 12:01:01,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3955140.0, ans=0.125 2023-11-29 12:01:19,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3955273.3333333335, ans=0.125 2023-11-29 12:01:24,881 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 593300 2023-11-29 12:01:30,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3955340.0, ans=0.1 2023-11-29 12:01:31,715 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.450e+01 9.221e+01 9.823e+01 1.065e+02 1.481e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-29 12:01:52,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3955473.3333333335, ans=0.125 2023-11-29 12:01:52,929 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 4150, loss[loss=0.04703, simple_loss=0.06034, pruned_loss=0.006069, audio_tagging_loss=0.01079, over 14550.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09053, pruned_loss=0.012, audio_tagging_loss=0.008545, over 3042460.92 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:02:02,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3955473.3333333335, ans=0.1 2023-11-29 12:02:08,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3955540.0, ans=0.1 2023-11-29 12:02:11,277 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.55 vs. limit=15.0 2023-11-29 12:02:13,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3955540.0, ans=0.125 2023-11-29 12:02:19,029 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:02:26,222 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 593350 2023-11-29 12:02:38,426 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 12:02:39,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.99 vs. limit=15.0 2023-11-29 12:02:45,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3955740.0, ans=0.1 2023-11-29 12:02:54,799 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 4200, loss[loss=0.09109, simple_loss=0.1199, pruned_loss=0.02217, audio_tagging_loss=0.008995, over 14693.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09103, pruned_loss=0.01207, audio_tagging_loss=0.008344, over 3043706.67 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:02:56,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3955806.6666666665, ans=0.125 2023-11-29 12:03:14,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3955873.3333333335, ans=0.07 2023-11-29 12:03:28,475 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 593400 2023-11-29 12:03:30,609 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2023-11-29 12:03:35,615 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.100e+01 9.099e+01 9.882e+01 1.051e+02 1.333e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-29 12:03:39,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3956006.6666666665, ans=0.125 2023-11-29 12:03:49,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3956073.3333333335, ans=0.125 2023-11-29 12:03:56,524 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 4250, loss[loss=0.07706, simple_loss=0.1045, pruned_loss=0.01723, audio_tagging_loss=0.0076, over 15254.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08985, pruned_loss=0.01198, audio_tagging_loss=0.008294, over 3042241.57 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:03:58,211 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.82 vs. limit=15.0 2023-11-29 12:04:05,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3956140.0, ans=0.125 2023-11-29 12:04:19,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3956206.6666666665, ans=0.125 2023-11-29 12:04:22,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3956273.3333333335, ans=0.1 2023-11-29 12:04:22,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3956273.3333333335, ans=0.1 2023-11-29 12:04:22,486 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.36 vs. limit=15.0 2023-11-29 12:04:23,614 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.17 vs. limit=15.0 2023-11-29 12:04:30,823 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 593450 2023-11-29 12:04:46,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3956406.6666666665, ans=0.0 2023-11-29 12:04:47,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.11 vs. limit=10.0 2023-11-29 12:04:52,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3956406.6666666665, ans=0.125 2023-11-29 12:04:58,959 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 4300, loss[loss=0.08287, simple_loss=0.117, pruned_loss=0.01675, audio_tagging_loss=0.00764, over 15030.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08996, pruned_loss=0.01197, audio_tagging_loss=0.008361, over 3044469.13 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 8.0 2023-11-29 12:04:59,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3956473.3333333335, ans=0.0 2023-11-29 12:05:22,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3956606.6666666665, ans=0.1 2023-11-29 12:05:30,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3956606.6666666665, ans=0.125 2023-11-29 12:05:31,839 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 593500 2023-11-29 12:05:40,628 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.001e+01 9.077e+01 9.622e+01 1.047e+02 1.414e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 12:05:42,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3956673.3333333335, ans=0.125 2023-11-29 12:05:43,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3956673.3333333335, ans=0.07 2023-11-29 12:06:00,193 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 4350, loss[loss=0.04162, simple_loss=0.05225, pruned_loss=0.004975, audio_tagging_loss=0.01052, over 15229.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08988, pruned_loss=0.01197, audio_tagging_loss=0.008346, over 3036165.00 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 8.0 2023-11-29 12:06:17,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3956873.3333333335, ans=0.125 2023-11-29 12:06:21,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3956873.3333333335, ans=0.125 2023-11-29 12:06:23,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3956940.0, ans=0.0 2023-11-29 12:06:33,036 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 593550 2023-11-29 12:06:45,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3957006.6666666665, ans=0.0 2023-11-29 12:07:02,017 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 4400, loss[loss=0.05415, simple_loss=0.07679, pruned_loss=0.007178, audio_tagging_loss=0.008581, over 15298.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08921, pruned_loss=0.01192, audio_tagging_loss=0.008395, over 3033762.99 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:07:10,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3957140.0, ans=0.2 2023-11-29 12:07:36,540 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 593600 2023-11-29 12:07:43,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3957340.0, ans=0.2 2023-11-29 12:07:45,790 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.856e+01 9.188e+01 9.758e+01 1.053e+02 1.476e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-29 12:07:47,641 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.96 vs. limit=22.5 2023-11-29 12:07:49,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3957340.0, ans=0.125 2023-11-29 12:07:50,126 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2023-11-29 12:08:00,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3957406.6666666665, ans=0.125 2023-11-29 12:08:05,318 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 4450, loss[loss=0.05388, simple_loss=0.06916, pruned_loss=0.009986, audio_tagging_loss=0.009312, over 13963.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08965, pruned_loss=0.01187, audio_tagging_loss=0.008309, over 3039110.11 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:08:19,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3957540.0, ans=0.125 2023-11-29 12:08:20,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3957540.0, ans=0.125 2023-11-29 12:08:36,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3957606.6666666665, ans=0.1 2023-11-29 12:08:38,795 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 593650 2023-11-29 12:08:56,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3957740.0, ans=0.2 2023-11-29 12:08:57,117 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-29 12:09:07,773 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 4500, loss[loss=0.07427, simple_loss=0.1034, pruned_loss=0.01539, audio_tagging_loss=0.007186, over 14367.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.0895, pruned_loss=0.01197, audio_tagging_loss=0.008263, over 3040189.95 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:09:22,969 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.82 vs. limit=15.0 2023-11-29 12:09:26,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3957873.3333333335, ans=0.125 2023-11-29 12:09:41,371 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 593700 2023-11-29 12:09:50,092 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 9.149e+01 9.833e+01 1.069e+02 1.731e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-29 12:09:52,159 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.24 vs. limit=10.0 2023-11-29 12:09:55,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3958006.6666666665, ans=0.0 2023-11-29 12:10:07,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3958140.0, ans=0.125 2023-11-29 12:10:08,739 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 4550, loss[loss=0.04912, simple_loss=0.06317, pruned_loss=0.005019, audio_tagging_loss=0.01252, over 15930.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.0891, pruned_loss=0.01189, audio_tagging_loss=0.008299, over 3039111.82 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:10:21,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3958206.6666666665, ans=0.0 2023-11-29 12:10:43,122 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 593750 2023-11-29 12:10:47,916 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:10:50,820 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=15.0 2023-11-29 12:10:57,156 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 12:10:58,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3958406.6666666665, ans=0.125 2023-11-29 12:11:11,269 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 4600, loss[loss=0.06626, simple_loss=0.09018, pruned_loss=0.01152, audio_tagging_loss=0.00965, over 15393.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08899, pruned_loss=0.01186, audio_tagging_loss=0.008346, over 3041907.56 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:11:21,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3958473.3333333335, ans=0.125 2023-11-29 12:11:44,197 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 593800 2023-11-29 12:11:53,845 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.850e+01 9.081e+01 9.672e+01 1.036e+02 1.224e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 12:12:10,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3958740.0, ans=0.125 2023-11-29 12:12:13,825 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 4650, loss[loss=0.08421, simple_loss=0.1211, pruned_loss=0.0175, audio_tagging_loss=0.006162, over 14608.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08959, pruned_loss=0.01195, audio_tagging_loss=0.008453, over 3046779.99 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:12:27,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3958873.3333333335, ans=0.125 2023-11-29 12:12:46,296 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 593850 2023-11-29 12:12:47,050 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.41 vs. limit=15.0 2023-11-29 12:12:49,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3959006.6666666665, ans=0.0 2023-11-29 12:12:54,730 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:13:00,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3959006.6666666665, ans=0.0 2023-11-29 12:13:03,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3959073.3333333335, ans=0.125 2023-11-29 12:13:08,170 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.50 vs. limit=15.0 2023-11-29 12:13:13,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3959140.0, ans=0.2 2023-11-29 12:13:14,365 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 4700, loss[loss=0.08566, simple_loss=0.1094, pruned_loss=0.02385, audio_tagging_loss=0.007126, over 16683.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.0892, pruned_loss=0.01193, audio_tagging_loss=0.008614, over 3058305.62 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:13:19,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3959140.0, ans=0.125 2023-11-29 12:13:23,364 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=12.0 2023-11-29 12:13:39,840 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=12.0 2023-11-29 12:13:47,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3959273.3333333335, ans=0.1 2023-11-29 12:13:48,637 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 593900 2023-11-29 12:13:56,717 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 9.196e+01 9.820e+01 1.091e+02 1.389e+02, threshold=1.964e+02, percent-clipped=0.0 2023-11-29 12:13:57,440 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.01 vs. limit=15.0 2023-11-29 12:13:58,729 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.14 vs. limit=10.0 2023-11-29 12:14:00,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3959340.0, ans=0.125 2023-11-29 12:14:16,793 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 4750, loss[loss=0.06322, simple_loss=0.08505, pruned_loss=0.009712, audio_tagging_loss=0.01098, over 14854.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08973, pruned_loss=0.01203, audio_tagging_loss=0.008634, over 3056084.75 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:14:42,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3959606.6666666665, ans=0.125 2023-11-29 12:14:49,579 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 593950 2023-11-29 12:15:09,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3959740.0, ans=0.125 2023-11-29 12:15:10,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3959740.0, ans=0.125 2023-11-29 12:15:19,313 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 4800, loss[loss=0.04835, simple_loss=0.06376, pruned_loss=0.007054, audio_tagging_loss=0.009415, over 14696.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08892, pruned_loss=0.01191, audio_tagging_loss=0.008777, over 3053353.36 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:15:22,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3959806.6666666665, ans=0.125 2023-11-29 12:15:24,626 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.30 vs. limit=15.0 2023-11-29 12:15:30,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3959873.3333333335, ans=0.125 2023-11-29 12:15:32,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3959873.3333333335, ans=0.2 2023-11-29 12:15:43,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3959940.0, ans=0.0 2023-11-29 12:15:46,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3959940.0, ans=0.07 2023-11-29 12:15:49,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3959940.0, ans=0.125 2023-11-29 12:15:52,366 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 594000 2023-11-29 12:16:01,789 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 9.011e+01 9.691e+01 1.047e+02 1.422e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 12:16:20,315 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 4850, loss[loss=0.08765, simple_loss=0.1134, pruned_loss=0.02359, audio_tagging_loss=0.007363, over 14149.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08918, pruned_loss=0.01211, audio_tagging_loss=0.008756, over 3047152.20 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:16:21,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3960140.0, ans=0.0 2023-11-29 12:16:27,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=22.5 2023-11-29 12:16:41,113 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.47 vs. limit=12.0 2023-11-29 12:16:43,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3960206.6666666665, ans=0.125 2023-11-29 12:16:47,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=22.5 2023-11-29 12:16:48,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3960273.3333333335, ans=0.0 2023-11-29 12:16:50,328 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.65 vs. limit=22.5 2023-11-29 12:16:54,294 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 594050 2023-11-29 12:17:03,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3960340.0, ans=0.125 2023-11-29 12:17:08,629 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2023-11-29 12:17:21,452 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 4900, loss[loss=0.04643, simple_loss=0.06531, pruned_loss=0.004022, audio_tagging_loss=0.009753, over 14434.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09004, pruned_loss=0.01217, audio_tagging_loss=0.008722, over 3045645.39 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:17:50,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3960606.6666666665, ans=0.0 2023-11-29 12:17:55,249 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 594100 2023-11-29 12:18:04,650 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.803e+01 9.116e+01 9.769e+01 1.041e+02 2.380e+02, threshold=1.954e+02, percent-clipped=1.0 2023-11-29 12:18:06,934 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:18:24,997 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 4950, loss[loss=0.0405, simple_loss=0.04852, pruned_loss=0.006439, audio_tagging_loss=0.009802, over 14724.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08951, pruned_loss=0.01198, audio_tagging_loss=0.00857, over 3046265.72 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:18:44,450 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.91 vs. limit=12.0 2023-11-29 12:18:47,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3960940.0, ans=0.0 2023-11-29 12:18:57,410 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 594150 2023-11-29 12:19:01,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3961006.6666666665, ans=10.0 2023-11-29 12:19:04,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3961006.6666666665, ans=0.125 2023-11-29 12:19:18,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3961073.3333333335, ans=0.0 2023-11-29 12:19:18,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3961073.3333333335, ans=0.0 2023-11-29 12:19:26,293 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 5000, loss[loss=0.07102, simple_loss=0.09792, pruned_loss=0.01365, audio_tagging_loss=0.008411, over 15108.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08943, pruned_loss=0.012, audio_tagging_loss=0.008463, over 3043581.30 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:19:52,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3961273.3333333335, ans=0.125 2023-11-29 12:19:58,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3961273.3333333335, ans=0.125 2023-11-29 12:19:59,599 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 594200 2023-11-29 12:20:03,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3961340.0, ans=0.0 2023-11-29 12:20:09,316 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.899e+01 8.950e+01 9.411e+01 1.015e+02 1.285e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-29 12:20:15,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3961406.6666666665, ans=0.1 2023-11-29 12:20:27,682 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 5050, loss[loss=0.06292, simple_loss=0.08822, pruned_loss=0.009495, audio_tagging_loss=0.009318, over 14580.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08907, pruned_loss=0.01192, audio_tagging_loss=0.008419, over 3041268.61 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:21:01,522 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 594250 2023-11-29 12:21:04,351 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=22.5 2023-11-29 12:21:22,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3961740.0, ans=0.0 2023-11-29 12:21:30,066 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 5100, loss[loss=0.0513, simple_loss=0.07162, pruned_loss=0.007124, audio_tagging_loss=0.008363, over 15326.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08843, pruned_loss=0.01188, audio_tagging_loss=0.008416, over 3034359.53 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:21:53,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3961940.0, ans=0.125 2023-11-29 12:22:02,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3961940.0, ans=0.125 2023-11-29 12:22:03,842 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 594300 2023-11-29 12:22:04,441 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.32 vs. limit=15.0 2023-11-29 12:22:05,531 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.96 vs. limit=12.0 2023-11-29 12:22:13,773 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.985e+01 9.588e+01 1.015e+02 1.337e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-29 12:22:21,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3962073.3333333335, ans=0.125 2023-11-29 12:22:32,657 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 5150, loss[loss=0.0485, simple_loss=0.05715, pruned_loss=0.008312, audio_tagging_loss=0.01161, over 14813.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08846, pruned_loss=0.01182, audio_tagging_loss=0.008459, over 3035020.22 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:23:02,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3962273.3333333335, ans=0.125 2023-11-29 12:23:06,605 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 594350 2023-11-29 12:23:08,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3962273.3333333335, ans=0.0 2023-11-29 12:23:14,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3962340.0, ans=0.1 2023-11-29 12:23:26,928 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.94 vs. limit=15.0 2023-11-29 12:23:34,660 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 5200, loss[loss=0.06843, simple_loss=0.09086, pruned_loss=0.01554, audio_tagging_loss=0.007461, over 14520.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08951, pruned_loss=0.01212, audio_tagging_loss=0.008368, over 3040812.36 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:23:43,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3962473.3333333335, ans=0.125 2023-11-29 12:24:06,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3962606.6666666665, ans=0.1 2023-11-29 12:24:06,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3962606.6666666665, ans=0.0 2023-11-29 12:24:08,869 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 594400 2023-11-29 12:24:18,559 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 9.283e+01 9.729e+01 1.049e+02 1.320e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-29 12:24:25,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3962740.0, ans=0.125 2023-11-29 12:24:37,174 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 5250, loss[loss=0.05844, simple_loss=0.08213, pruned_loss=0.0103, audio_tagging_loss=0.007079, over 15170.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.09034, pruned_loss=0.01219, audio_tagging_loss=0.008352, over 3037068.67 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:24:45,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.65 vs. limit=10.0 2023-11-29 12:25:02,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3962940.0, ans=0.125 2023-11-29 12:25:10,172 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 594450 2023-11-29 12:25:32,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2023-11-29 12:25:39,444 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 5300, loss[loss=0.05825, simple_loss=0.07909, pruned_loss=0.008763, audio_tagging_loss=0.009943, over 14443.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08972, pruned_loss=0.01204, audio_tagging_loss=0.008403, over 3036050.41 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:25:57,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.40 vs. limit=15.0 2023-11-29 12:25:58,757 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:26:03,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.58 vs. limit=22.5 2023-11-29 12:26:09,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3963273.3333333335, ans=0.0 2023-11-29 12:26:13,261 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 594500 2023-11-29 12:26:22,682 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.843e+01 9.148e+01 9.632e+01 1.017e+02 1.264e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 12:26:23,448 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.19 vs. limit=12.0 2023-11-29 12:26:38,009 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:26:41,287 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 5350, loss[loss=0.07799, simple_loss=0.1102, pruned_loss=0.01685, audio_tagging_loss=0.00604, over 15421.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08967, pruned_loss=0.01187, audio_tagging_loss=0.008391, over 3044685.08 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:26:47,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3963473.3333333335, ans=0.0 2023-11-29 12:26:57,020 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:27:10,445 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2023-11-29 12:27:11,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3963606.6666666665, ans=0.2 2023-11-29 12:27:15,404 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 594550 2023-11-29 12:27:19,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3963673.3333333335, ans=0.0 2023-11-29 12:27:21,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3963673.3333333335, ans=0.125 2023-11-29 12:27:41,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3963740.0, ans=0.1 2023-11-29 12:27:43,674 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 5400, loss[loss=0.07123, simple_loss=0.09204, pruned_loss=0.01454, audio_tagging_loss=0.01066, over 16165.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08985, pruned_loss=0.01185, audio_tagging_loss=0.008467, over 3046374.23 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:28:07,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.76 vs. limit=15.0 2023-11-29 12:28:16,333 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 594600 2023-11-29 12:28:26,684 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.274e+01 9.096e+01 9.650e+01 1.029e+02 1.446e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 12:28:41,324 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.21 vs. limit=22.5 2023-11-29 12:28:45,307 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 5450, loss[loss=0.06791, simple_loss=0.09395, pruned_loss=0.01351, audio_tagging_loss=0.007425, over 14946.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.0891, pruned_loss=0.01175, audio_tagging_loss=0.008536, over 3041198.17 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:28:46,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3964140.0, ans=0.1 2023-11-29 12:28:58,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3964206.6666666665, ans=0.125 2023-11-29 12:28:59,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3964206.6666666665, ans=0.1 2023-11-29 12:29:02,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3964206.6666666665, ans=0.125 2023-11-29 12:29:05,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3964206.6666666665, ans=0.1 2023-11-29 12:29:19,092 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 594650 2023-11-29 12:29:20,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3964273.3333333335, ans=0.2 2023-11-29 12:29:27,456 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.19 vs. limit=10.0 2023-11-29 12:29:47,557 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 5500, loss[loss=0.07473, simple_loss=0.1009, pruned_loss=0.0148, audio_tagging_loss=0.009453, over 14919.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08955, pruned_loss=0.01174, audio_tagging_loss=0.008499, over 3039101.64 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:29:58,643 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.48 vs. limit=10.0 2023-11-29 12:30:06,145 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.86 vs. limit=10.0 2023-11-29 12:30:15,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3964606.6666666665, ans=0.0 2023-11-29 12:30:16,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3964606.6666666665, ans=0.1 2023-11-29 12:30:20,675 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.41 vs. limit=10.0 2023-11-29 12:30:21,233 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 594700 2023-11-29 12:30:22,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3964606.6666666665, ans=0.125 2023-11-29 12:30:30,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3964673.3333333335, ans=0.125 2023-11-29 12:30:32,243 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.770e+01 9.260e+01 9.828e+01 1.052e+02 2.145e+02, threshold=1.966e+02, percent-clipped=1.0 2023-11-29 12:30:43,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3964740.0, ans=0.0 2023-11-29 12:30:49,513 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 5550, loss[loss=0.06046, simple_loss=0.0843, pruned_loss=0.008461, audio_tagging_loss=0.00985, over 15341.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08897, pruned_loss=0.01159, audio_tagging_loss=0.008674, over 3036970.74 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:30:49,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3964806.6666666665, ans=0.0 2023-11-29 12:31:09,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3964873.3333333335, ans=0.125 2023-11-29 12:31:09,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3964873.3333333335, ans=0.125 2023-11-29 12:31:22,539 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 594750 2023-11-29 12:31:26,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3965006.6666666665, ans=0.125 2023-11-29 12:31:32,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3965006.6666666665, ans=0.1 2023-11-29 12:31:33,654 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-11-29 12:31:49,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3965073.3333333335, ans=0.2 2023-11-29 12:31:52,113 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 5600, loss[loss=0.06825, simple_loss=0.09615, pruned_loss=0.01292, audio_tagging_loss=0.00725, over 16075.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.0901, pruned_loss=0.01179, audio_tagging_loss=0.008705, over 3043758.95 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:32:16,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3965273.3333333335, ans=0.1 2023-11-29 12:32:25,807 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 594800 2023-11-29 12:32:37,347 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.718e+01 9.303e+01 9.793e+01 1.041e+02 1.252e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-29 12:32:38,592 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 12:32:50,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3965406.6666666665, ans=0.125 2023-11-29 12:32:53,611 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 5650, loss[loss=0.04684, simple_loss=0.06284, pruned_loss=0.004737, audio_tagging_loss=0.01069, over 14913.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08998, pruned_loss=0.01184, audio_tagging_loss=0.008784, over 3050906.31 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:32:56,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3965473.3333333335, ans=0.125 2023-11-29 12:33:19,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3965606.6666666665, ans=0.1 2023-11-29 12:33:25,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3965606.6666666665, ans=0.1 2023-11-29 12:33:28,091 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 594850 2023-11-29 12:33:40,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3965673.3333333335, ans=0.125 2023-11-29 12:33:46,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3965740.0, ans=0.125 2023-11-29 12:33:50,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3965740.0, ans=0.125 2023-11-29 12:33:56,412 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 5700, loss[loss=0.0516, simple_loss=0.06477, pruned_loss=0.01123, audio_tagging_loss=0.007993, over 14818.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08949, pruned_loss=0.01171, audio_tagging_loss=0.008729, over 3048236.16 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:33:58,154 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2023-11-29 12:34:08,448 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=22.5 2023-11-29 12:34:09,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3965873.3333333335, ans=0.0 2023-11-29 12:34:29,237 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 594900 2023-11-29 12:34:35,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3966006.6666666665, ans=0.0 2023-11-29 12:34:37,010 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:34:40,790 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.829e+01 8.907e+01 9.442e+01 9.916e+01 1.221e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-29 12:34:41,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3966006.6666666665, ans=0.125 2023-11-29 12:34:42,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3966006.6666666665, ans=0.0 2023-11-29 12:34:57,089 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.16 vs. limit=15.0 2023-11-29 12:34:58,499 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 5750, loss[loss=0.06057, simple_loss=0.08195, pruned_loss=0.009863, audio_tagging_loss=0.009731, over 15479.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08891, pruned_loss=0.0115, audio_tagging_loss=0.00869, over 3051045.01 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:35:02,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3966140.0, ans=0.2 2023-11-29 12:35:24,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3966273.3333333335, ans=0.0 2023-11-29 12:35:32,021 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 594950 2023-11-29 12:35:41,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3966340.0, ans=0.0 2023-11-29 12:35:50,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3966406.6666666665, ans=0.1 2023-11-29 12:35:56,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3966406.6666666665, ans=0.0 2023-11-29 12:36:00,172 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 5800, loss[loss=0.08258, simple_loss=0.1159, pruned_loss=0.01613, audio_tagging_loss=0.008476, over 15789.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08894, pruned_loss=0.01153, audio_tagging_loss=0.00851, over 3047240.87 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:36:15,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3966540.0, ans=0.0 2023-11-29 12:36:25,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3966606.6666666665, ans=0.07 2023-11-29 12:36:32,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3966606.6666666665, ans=0.0 2023-11-29 12:36:34,012 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 595000 2023-11-29 12:36:39,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3966673.3333333335, ans=0.1 2023-11-29 12:36:42,808 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:36:45,839 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 9.166e+01 9.851e+01 1.059e+02 1.504e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 12:37:01,741 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 5850, loss[loss=0.07187, simple_loss=0.1002, pruned_loss=0.0127, audio_tagging_loss=0.009084, over 15516.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08872, pruned_loss=0.01152, audio_tagging_loss=0.008554, over 3048927.87 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:37:06,783 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.92 vs. limit=22.5 2023-11-29 12:37:07,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3966806.6666666665, ans=0.0 2023-11-29 12:37:14,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3966873.3333333335, ans=0.125 2023-11-29 12:37:27,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3966940.0, ans=0.125 2023-11-29 12:37:34,487 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 595050 2023-11-29 12:37:59,267 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.08 vs. limit=10.0 2023-11-29 12:38:03,793 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 5900, loss[loss=0.05792, simple_loss=0.08298, pruned_loss=0.01043, audio_tagging_loss=0.006003, over 15909.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.089, pruned_loss=0.01182, audio_tagging_loss=0.008491, over 3055003.75 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:38:10,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3967140.0, ans=0.125 2023-11-29 12:38:13,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3967140.0, ans=0.1 2023-11-29 12:38:28,724 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:38:37,041 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 595100 2023-11-29 12:38:44,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3967340.0, ans=0.0 2023-11-29 12:38:49,680 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 9.143e+01 9.896e+01 1.087e+02 1.374e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 12:39:04,764 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 5950, loss[loss=0.05462, simple_loss=0.06819, pruned_loss=0.0105, audio_tagging_loss=0.01003, over 16908.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08967, pruned_loss=0.01183, audio_tagging_loss=0.008465, over 3058047.14 frames. ], batch size: 67, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:39:06,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3967473.3333333335, ans=0.0 2023-11-29 12:39:26,434 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.93 vs. limit=10.0 2023-11-29 12:39:36,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3967606.6666666665, ans=0.125 2023-11-29 12:39:38,966 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 595150 2023-11-29 12:39:46,373 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=22.5 2023-11-29 12:39:53,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3967740.0, ans=0.125 2023-11-29 12:40:02,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3967740.0, ans=0.1 2023-11-29 12:40:06,608 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 6000, loss[loss=0.06776, simple_loss=0.08526, pruned_loss=0.01584, audio_tagging_loss=0.009289, over 15256.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.0892, pruned_loss=0.01176, audio_tagging_loss=0.008421, over 3052850.13 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:40:06,611 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-29 12:40:25,663 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1632, 4.6281, 5.2144, 4.9026], device='cuda:0') 2023-11-29 12:40:46,474 INFO [train_asr.py:1267] (0/4) Epoch 50, validation: loss=0.05775, simple_loss=0.05043, pruned_loss=0.005339, audio_tagging_loss=0.0272, over 4681554.00 frames. 2023-11-29 12:40:46,475 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-29 12:40:46,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3967806.6666666665, ans=0.125 2023-11-29 12:40:56,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3967806.6666666665, ans=0.1 2023-11-29 12:40:58,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3967873.3333333335, ans=0.0 2023-11-29 12:41:18,954 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 595200 2023-11-29 12:41:21,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3968006.6666666665, ans=0.1 2023-11-29 12:41:24,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3968006.6666666665, ans=0.0 2023-11-29 12:41:27,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3968006.6666666665, ans=0.2 2023-11-29 12:41:27,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3968006.6666666665, ans=0.0 2023-11-29 12:41:31,210 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2023-11-29 12:41:32,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3968006.6666666665, ans=0.125 2023-11-29 12:41:32,840 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.848e+01 9.055e+01 9.788e+01 1.026e+02 1.358e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-29 12:41:32,943 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 12:41:37,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3968073.3333333335, ans=0.0 2023-11-29 12:41:48,279 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 6050, loss[loss=0.06335, simple_loss=0.09113, pruned_loss=0.00965, audio_tagging_loss=0.008135, over 15232.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08954, pruned_loss=0.01188, audio_tagging_loss=0.00843, over 3052734.65 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:41:49,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3968140.0, ans=6.0 2023-11-29 12:41:54,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3968140.0, ans=0.2 2023-11-29 12:42:12,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3968273.3333333335, ans=0.035 2023-11-29 12:42:21,714 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 595250 2023-11-29 12:42:21,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3968273.3333333335, ans=0.125 2023-11-29 12:42:22,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3968273.3333333335, ans=0.5 2023-11-29 12:42:25,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3968340.0, ans=0.125 2023-11-29 12:42:33,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3968340.0, ans=0.0 2023-11-29 12:42:35,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3968340.0, ans=0.1 2023-11-29 12:42:36,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3968406.6666666665, ans=0.125 2023-11-29 12:42:49,420 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 6100, loss[loss=0.06592, simple_loss=0.09032, pruned_loss=0.0103, audio_tagging_loss=0.01046, over 15401.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08941, pruned_loss=0.01183, audio_tagging_loss=0.008445, over 3049616.29 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:43:01,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3968540.0, ans=0.0 2023-11-29 12:43:10,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3968540.0, ans=0.1 2023-11-29 12:43:20,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3968606.6666666665, ans=0.0 2023-11-29 12:43:22,764 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 595300 2023-11-29 12:43:26,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3968673.3333333335, ans=0.0 2023-11-29 12:43:35,518 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.840e+01 9.140e+01 9.748e+01 1.061e+02 1.283e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-29 12:43:36,211 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=15.0 2023-11-29 12:43:45,444 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.38 vs. limit=10.0 2023-11-29 12:43:51,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3968806.6666666665, ans=0.0 2023-11-29 12:43:52,123 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 6150, loss[loss=0.06583, simple_loss=0.0809, pruned_loss=0.0139, audio_tagging_loss=0.01148, over 15018.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08971, pruned_loss=0.01194, audio_tagging_loss=0.008486, over 3048711.23 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:44:02,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3968873.3333333335, ans=0.05 2023-11-29 12:44:12,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3968873.3333333335, ans=0.125 2023-11-29 12:44:14,826 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:44:17,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3968940.0, ans=0.0 2023-11-29 12:44:24,737 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 595350 2023-11-29 12:44:29,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2023-11-29 12:44:35,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3969006.6666666665, ans=0.125 2023-11-29 12:44:44,802 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2023-11-29 12:44:53,592 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 6200, loss[loss=0.07993, simple_loss=0.115, pruned_loss=0.01517, audio_tagging_loss=0.007256, over 14833.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08907, pruned_loss=0.01189, audio_tagging_loss=0.008602, over 3048780.36 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:44:56,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3969140.0, ans=0.2 2023-11-29 12:45:00,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3969140.0, ans=0.1 2023-11-29 12:45:02,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3969140.0, ans=10.0 2023-11-29 12:45:02,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3969140.0, ans=0.1 2023-11-29 12:45:27,144 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 595400 2023-11-29 12:45:38,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3969340.0, ans=0.2 2023-11-29 12:45:39,705 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.103e+01 9.032e+01 9.591e+01 1.015e+02 1.293e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-29 12:45:55,757 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 6250, loss[loss=0.08336, simple_loss=0.1162, pruned_loss=0.01811, audio_tagging_loss=0.007132, over 15684.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08905, pruned_loss=0.01174, audio_tagging_loss=0.008704, over 3050412.46 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:46:29,241 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 595450 2023-11-29 12:46:35,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3969673.3333333335, ans=0.1 2023-11-29 12:46:35,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3969673.3333333335, ans=0.125 2023-11-29 12:46:57,360 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 6300, loss[loss=0.05613, simple_loss=0.07567, pruned_loss=0.007943, audio_tagging_loss=0.01036, over 15171.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08887, pruned_loss=0.01153, audio_tagging_loss=0.008797, over 3048688.89 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:47:03,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3969806.6666666665, ans=0.125 2023-11-29 12:47:19,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3969873.3333333335, ans=0.0 2023-11-29 12:47:31,214 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 595500 2023-11-29 12:47:44,501 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 8.915e+01 9.519e+01 1.033e+02 1.401e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-29 12:47:59,917 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 6350, loss[loss=0.07377, simple_loss=0.104, pruned_loss=0.01393, audio_tagging_loss=0.007841, over 15511.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08874, pruned_loss=0.01164, audio_tagging_loss=0.00886, over 3046170.56 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:48:02,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3970140.0, ans=0.0 2023-11-29 12:48:04,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3970140.0, ans=0.1 2023-11-29 12:48:16,663 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.49 vs. limit=15.0 2023-11-29 12:48:18,747 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=15.0 2023-11-29 12:48:19,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3970206.6666666665, ans=0.125 2023-11-29 12:48:29,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3970273.3333333335, ans=0.1 2023-11-29 12:48:32,949 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 595550 2023-11-29 12:48:33,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3970273.3333333335, ans=0.125 2023-11-29 12:48:53,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3970406.6666666665, ans=0.0 2023-11-29 12:48:54,861 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.33 vs. limit=6.0 2023-11-29 12:48:58,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=3970406.6666666665, ans=0.2 2023-11-29 12:49:01,858 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 6400, loss[loss=0.05144, simple_loss=0.06445, pruned_loss=0.008816, audio_tagging_loss=0.0104, over 14522.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08808, pruned_loss=0.01154, audio_tagging_loss=0.008969, over 3052281.76 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:49:13,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3970540.0, ans=0.1 2023-11-29 12:49:24,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3970540.0, ans=0.125 2023-11-29 12:49:35,436 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 595600 2023-11-29 12:49:50,259 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.542e+01 9.115e+01 9.887e+01 1.069e+02 1.285e+02, threshold=1.977e+02, percent-clipped=0.0 2023-11-29 12:49:59,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3970740.0, ans=0.1 2023-11-29 12:50:03,033 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 6450, loss[loss=0.04873, simple_loss=0.06326, pruned_loss=0.005957, audio_tagging_loss=0.01114, over 15903.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08779, pruned_loss=0.01163, audio_tagging_loss=0.009029, over 3044381.16 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:50:13,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3970806.6666666665, ans=0.125 2023-11-29 12:50:23,222 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.91 vs. limit=10.0 2023-11-29 12:50:37,750 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 595650 2023-11-29 12:50:43,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3971006.6666666665, ans=0.5 2023-11-29 12:51:04,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3971140.0, ans=0.125 2023-11-29 12:51:05,781 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 6500, loss[loss=0.08526, simple_loss=0.1108, pruned_loss=0.02274, audio_tagging_loss=0.007106, over 14953.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08775, pruned_loss=0.01168, audio_tagging_loss=0.008943, over 3049620.59 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:51:11,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3971140.0, ans=0.0 2023-11-29 12:51:16,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3971140.0, ans=0.1 2023-11-29 12:51:17,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3971206.6666666665, ans=0.125 2023-11-29 12:51:18,491 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:51:30,142 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.09 vs. limit=15.0 2023-11-29 12:51:39,724 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 595700 2023-11-29 12:51:53,743 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.34 vs. limit=15.0 2023-11-29 12:51:54,124 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 9.181e+01 9.755e+01 1.057e+02 1.258e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-29 12:52:01,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3971406.6666666665, ans=0.125 2023-11-29 12:52:03,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3971406.6666666665, ans=0.0 2023-11-29 12:52:05,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3971406.6666666665, ans=0.1 2023-11-29 12:52:06,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3971473.3333333335, ans=0.0 2023-11-29 12:52:07,944 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 6550, loss[loss=0.06729, simple_loss=0.09291, pruned_loss=0.01246, audio_tagging_loss=0.008375, over 13763.00 frames. ], tot_loss[loss=0.06433, simple_loss=0.08796, pruned_loss=0.01163, audio_tagging_loss=0.008724, over 3046624.12 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:52:30,791 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.07 vs. limit=10.0 2023-11-29 12:52:41,597 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 595750 2023-11-29 12:52:41,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3971606.6666666665, ans=0.0 2023-11-29 12:52:55,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3971673.3333333335, ans=0.125 2023-11-29 12:52:57,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3971740.0, ans=0.125 2023-11-29 12:52:58,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.85 vs. limit=10.0 2023-11-29 12:53:09,518 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 6600, loss[loss=0.06948, simple_loss=0.09805, pruned_loss=0.01386, audio_tagging_loss=0.00659, over 15343.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08853, pruned_loss=0.01169, audio_tagging_loss=0.008521, over 3039205.60 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:53:21,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3971873.3333333335, ans=0.125 2023-11-29 12:53:27,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3971873.3333333335, ans=0.125 2023-11-29 12:53:38,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3971940.0, ans=0.125 2023-11-29 12:53:40,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3971940.0, ans=0.1 2023-11-29 12:53:42,836 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 595800 2023-11-29 12:53:43,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3971940.0, ans=0.0 2023-11-29 12:53:49,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3972006.6666666665, ans=0.1 2023-11-29 12:53:57,533 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.899e+01 9.360e+01 1.006e+02 1.174e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-29 12:53:59,629 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.31 vs. limit=15.0 2023-11-29 12:54:03,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3972073.3333333335, ans=0.0 2023-11-29 12:54:05,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2023-11-29 12:54:11,714 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 6650, loss[loss=0.06363, simple_loss=0.08816, pruned_loss=0.01221, audio_tagging_loss=0.007345, over 15441.00 frames. ], tot_loss[loss=0.06394, simple_loss=0.08793, pruned_loss=0.01152, audio_tagging_loss=0.008446, over 3035912.30 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:54:35,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3972273.3333333335, ans=0.125 2023-11-29 12:54:35,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3972273.3333333335, ans=0.125 2023-11-29 12:54:43,192 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.53 vs. limit=15.0 2023-11-29 12:54:45,025 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 595850 2023-11-29 12:54:48,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.18 vs. limit=15.0 2023-11-29 12:55:01,937 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.89 vs. limit=15.0 2023-11-29 12:55:13,852 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 6700, loss[loss=0.06312, simple_loss=0.08133, pruned_loss=0.01246, audio_tagging_loss=0.01, over 15873.00 frames. ], tot_loss[loss=0.0636, simple_loss=0.08769, pruned_loss=0.01135, audio_tagging_loss=0.008403, over 3041877.39 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:55:16,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3972473.3333333335, ans=0.2 2023-11-29 12:55:32,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3972540.0, ans=0.1 2023-11-29 12:55:44,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3972606.6666666665, ans=0.1 2023-11-29 12:55:46,625 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 595900 2023-11-29 12:55:46,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3972606.6666666665, ans=0.125 2023-11-29 12:55:48,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3972606.6666666665, ans=0.125 2023-11-29 12:55:54,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3972673.3333333335, ans=0.0 2023-11-29 12:55:59,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3972673.3333333335, ans=0.125 2023-11-29 12:56:01,558 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 9.149e+01 9.695e+01 1.030e+02 1.289e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-29 12:56:15,575 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 6750, loss[loss=0.07238, simple_loss=0.09814, pruned_loss=0.01592, audio_tagging_loss=0.007387, over 14659.00 frames. ], tot_loss[loss=0.06386, simple_loss=0.08803, pruned_loss=0.01147, audio_tagging_loss=0.008374, over 3038273.01 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:56:15,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3972806.6666666665, ans=0.04949747468305833 2023-11-29 12:56:24,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3972806.6666666665, ans=0.0 2023-11-29 12:56:48,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3972940.0, ans=0.125 2023-11-29 12:56:49,892 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 595950 2023-11-29 12:57:17,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3973140.0, ans=0.125 2023-11-29 12:57:18,155 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 6800, loss[loss=0.05768, simple_loss=0.08124, pruned_loss=0.01041, audio_tagging_loss=0.006641, over 15193.00 frames. ], tot_loss[loss=0.06417, simple_loss=0.08866, pruned_loss=0.01155, audio_tagging_loss=0.008287, over 3041123.36 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:57:46,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3973273.3333333335, ans=0.125 2023-11-29 12:57:51,587 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 596000 2023-11-29 12:57:53,047 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-596000.pt 2023-11-29 12:57:58,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3973340.0, ans=0.125 2023-11-29 12:58:02,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3973340.0, ans=6.0 2023-11-29 12:58:09,399 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.052e+01 8.975e+01 9.560e+01 1.019e+02 1.968e+02, threshold=1.912e+02, percent-clipped=1.0 2023-11-29 12:58:22,175 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 6850, loss[loss=0.07261, simple_loss=0.1006, pruned_loss=0.01546, audio_tagging_loss=0.006878, over 15225.00 frames. ], tot_loss[loss=0.06438, simple_loss=0.08896, pruned_loss=0.01161, audio_tagging_loss=0.008284, over 3035263.38 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:58:29,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3973473.3333333335, ans=0.125 2023-11-29 12:58:29,291 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.74 vs. limit=15.0 2023-11-29 12:58:56,502 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 596050 2023-11-29 12:58:59,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3973673.3333333335, ans=0.0 2023-11-29 12:59:06,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3973673.3333333335, ans=0.1 2023-11-29 12:59:09,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3973673.3333333335, ans=0.0 2023-11-29 12:59:23,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3973740.0, ans=0.5 2023-11-29 12:59:24,854 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 6900, loss[loss=0.106, simple_loss=0.151, pruned_loss=0.02428, audio_tagging_loss=0.006233, over 16749.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08994, pruned_loss=0.01183, audio_tagging_loss=0.008239, over 3035587.09 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:59:25,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3973806.6666666665, ans=0.125 2023-11-29 12:59:26,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3973806.6666666665, ans=0.0 2023-11-29 12:59:32,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3973806.6666666665, ans=0.125 2023-11-29 12:59:58,070 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 596100 2023-11-29 13:00:13,496 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.949e+01 9.796e+01 1.035e+02 1.230e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-29 13:00:13,584 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 13:00:18,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3974073.3333333335, ans=0.0 2023-11-29 13:00:25,750 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 6950, loss[loss=0.07887, simple_loss=0.1072, pruned_loss=0.01717, audio_tagging_loss=0.008099, over 16326.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.09074, pruned_loss=0.01188, audio_tagging_loss=0.008306, over 3039719.75 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:00:34,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.00 vs. limit=10.0 2023-11-29 13:00:36,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3974140.0, ans=0.125 2023-11-29 13:00:36,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3974140.0, ans=0.5 2023-11-29 13:00:44,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3974206.6666666665, ans=0.125 2023-11-29 13:00:50,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3974273.3333333335, ans=0.0 2023-11-29 13:00:55,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3974273.3333333335, ans=0.125 2023-11-29 13:00:59,260 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 596150 2023-11-29 13:01:00,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3974273.3333333335, ans=0.0 2023-11-29 13:01:19,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3974406.6666666665, ans=0.125 2023-11-29 13:01:27,343 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 7000, loss[loss=0.04889, simple_loss=0.07012, pruned_loss=0.006524, audio_tagging_loss=0.00731, over 14792.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.09023, pruned_loss=0.01178, audio_tagging_loss=0.008387, over 3036763.36 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:01:41,261 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2023-11-29 13:01:44,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3974540.0, ans=0.125 2023-11-29 13:01:44,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3974540.0, ans=0.125 2023-11-29 13:02:01,311 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 596200 2023-11-29 13:02:16,536 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.861e+01 9.690e+01 1.060e+02 1.703e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 13:02:21,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3974740.0, ans=0.125 2023-11-29 13:02:29,630 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 7050, loss[loss=0.07304, simple_loss=0.1035, pruned_loss=0.01238, audio_tagging_loss=0.008936, over 15082.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.09029, pruned_loss=0.01182, audio_tagging_loss=0.008485, over 3031228.65 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:02:32,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3974806.6666666665, ans=0.125 2023-11-29 13:03:00,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3974940.0, ans=0.125 2023-11-29 13:03:01,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3974940.0, ans=0.0 2023-11-29 13:03:02,712 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 596250 2023-11-29 13:03:06,455 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:03:11,528 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.87 vs. limit=10.0 2023-11-29 13:03:23,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3975073.3333333335, ans=0.0 2023-11-29 13:03:23,193 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:03:31,621 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 7100, loss[loss=0.06021, simple_loss=0.08637, pruned_loss=0.009175, audio_tagging_loss=0.007845, over 14304.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08959, pruned_loss=0.01167, audio_tagging_loss=0.008518, over 3036045.39 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:03:36,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3975140.0, ans=0.0 2023-11-29 13:04:05,092 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 596300 2023-11-29 13:04:07,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3975340.0, ans=0.0 2023-11-29 13:04:13,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3975340.0, ans=0.125 2023-11-29 13:04:20,799 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 9.433e+01 1.002e+02 1.073e+02 1.406e+02, threshold=2.003e+02, percent-clipped=0.0 2023-11-29 13:04:32,946 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 7150, loss[loss=0.06082, simple_loss=0.08048, pruned_loss=0.01095, audio_tagging_loss=0.009628, over 15279.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.09, pruned_loss=0.01181, audio_tagging_loss=0.008539, over 3041777.39 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:04:33,911 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.03 vs. limit=12.0 2023-11-29 13:05:06,725 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 596350 2023-11-29 13:05:19,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3975673.3333333335, ans=0.125 2023-11-29 13:05:23,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3975740.0, ans=0.0 2023-11-29 13:05:34,670 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 7200, loss[loss=0.06343, simple_loss=0.09221, pruned_loss=0.007745, audio_tagging_loss=0.009582, over 15923.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08984, pruned_loss=0.01182, audio_tagging_loss=0.008578, over 3046810.32 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:06:06,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3975940.0, ans=0.2 2023-11-29 13:06:08,394 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 596400 2023-11-29 13:06:10,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3975940.0, ans=0.125 2023-11-29 13:06:11,842 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.66 vs. limit=15.0 2023-11-29 13:06:15,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=3976006.6666666665, ans=12.0 2023-11-29 13:06:24,310 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.018e+01 9.297e+01 9.851e+01 1.057e+02 1.501e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 13:06:34,883 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.25 vs. limit=12.0 2023-11-29 13:06:35,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3976140.0, ans=0.0 2023-11-29 13:06:36,957 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 7250, loss[loss=0.04741, simple_loss=0.0611, pruned_loss=0.008412, audio_tagging_loss=0.008444, over 14454.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.09022, pruned_loss=0.01187, audio_tagging_loss=0.008602, over 3042844.26 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:07:07,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3976273.3333333335, ans=0.1 2023-11-29 13:07:09,965 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 596450 2023-11-29 13:07:24,730 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-29 13:07:38,443 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 7300, loss[loss=0.07011, simple_loss=0.09641, pruned_loss=0.01255, audio_tagging_loss=0.009355, over 15292.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.09023, pruned_loss=0.01184, audio_tagging_loss=0.008545, over 3043679.34 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:07:51,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3976540.0, ans=0.125 2023-11-29 13:07:56,412 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.94 vs. limit=22.5 2023-11-29 13:08:05,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3976606.6666666665, ans=0.125 2023-11-29 13:08:11,961 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 596500 2023-11-29 13:08:19,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3976673.3333333335, ans=0.125 2023-11-29 13:08:24,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3976673.3333333335, ans=0.1 2023-11-29 13:08:28,805 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 9.133e+01 9.755e+01 1.029e+02 1.413e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-29 13:08:39,166 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 7350, loss[loss=0.07125, simple_loss=0.1005, pruned_loss=0.01553, audio_tagging_loss=0.005488, over 14872.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08947, pruned_loss=0.01186, audio_tagging_loss=0.008487, over 3044169.57 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:08:40,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3976806.6666666665, ans=0.1 2023-11-29 13:08:49,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3976806.6666666665, ans=0.125 2023-11-29 13:08:58,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3976873.3333333335, ans=0.125 2023-11-29 13:09:11,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3976940.0, ans=0.125 2023-11-29 13:09:12,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3976940.0, ans=0.2 2023-11-29 13:09:13,207 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 596550 2023-11-29 13:09:14,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3976940.0, ans=0.125 2023-11-29 13:09:21,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3977006.6666666665, ans=0.125 2023-11-29 13:09:40,652 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 7400, loss[loss=0.05977, simple_loss=0.07447, pruned_loss=0.01191, audio_tagging_loss=0.01063, over 14570.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08962, pruned_loss=0.01177, audio_tagging_loss=0.008424, over 3038893.39 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:09:56,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.12 vs. limit=12.0 2023-11-29 13:09:59,790 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.26 vs. limit=15.0 2023-11-29 13:10:04,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3977273.3333333335, ans=0.125 2023-11-29 13:10:14,009 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 596600 2023-11-29 13:10:31,697 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.135e+01 9.143e+01 9.729e+01 1.063e+02 1.656e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-29 13:10:43,575 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 7450, loss[loss=0.06545, simple_loss=0.08578, pruned_loss=0.0156, audio_tagging_loss=0.006955, over 13896.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08901, pruned_loss=0.01178, audio_tagging_loss=0.008385, over 3031814.67 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:10:56,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3977540.0, ans=0.09899494936611666 2023-11-29 13:11:01,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3977540.0, ans=0.0 2023-11-29 13:11:05,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=15.0 2023-11-29 13:11:07,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3977606.6666666665, ans=0.125 2023-11-29 13:11:16,266 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 596650 2023-11-29 13:11:19,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3977673.3333333335, ans=0.0 2023-11-29 13:11:35,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3977740.0, ans=0.125 2023-11-29 13:11:37,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3977740.0, ans=0.0 2023-11-29 13:11:44,399 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 7500, loss[loss=0.06949, simple_loss=0.1058, pruned_loss=0.01044, audio_tagging_loss=0.006139, over 15963.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08903, pruned_loss=0.01177, audio_tagging_loss=0.008419, over 3033969.88 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:12:18,364 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 596700 2023-11-29 13:12:25,013 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.22 vs. limit=10.0 2023-11-29 13:12:26,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3978006.6666666665, ans=0.0 2023-11-29 13:12:34,570 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.003e+01 9.233e+01 9.870e+01 1.058e+02 1.396e+02, threshold=1.974e+02, percent-clipped=0.0 2023-11-29 13:12:38,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3978073.3333333335, ans=0.125 2023-11-29 13:12:39,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3978073.3333333335, ans=0.0 2023-11-29 13:12:40,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3978073.3333333335, ans=0.0 2023-11-29 13:12:45,811 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 7550, loss[loss=0.07006, simple_loss=0.08868, pruned_loss=0.01324, audio_tagging_loss=0.01247, over 16430.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08957, pruned_loss=0.01178, audio_tagging_loss=0.008377, over 3044494.35 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:12:52,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3978140.0, ans=0.0 2023-11-29 13:13:05,122 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2023-11-29 13:13:08,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3978206.6666666665, ans=0.0 2023-11-29 13:13:15,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3978273.3333333335, ans=0.0 2023-11-29 13:13:15,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3978273.3333333335, ans=0.125 2023-11-29 13:13:18,420 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 596750 2023-11-29 13:13:27,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3978340.0, ans=0.1 2023-11-29 13:13:32,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3978340.0, ans=0.125 2023-11-29 13:13:37,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3978406.6666666665, ans=0.125 2023-11-29 13:13:37,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3978406.6666666665, ans=0.0 2023-11-29 13:13:40,188 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.55 vs. limit=15.0 2023-11-29 13:13:48,029 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 7600, loss[loss=0.05975, simple_loss=0.08381, pruned_loss=0.01078, audio_tagging_loss=0.007067, over 14555.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08857, pruned_loss=0.01177, audio_tagging_loss=0.008417, over 3047563.30 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:13:58,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3978540.0, ans=0.2 2023-11-29 13:14:19,870 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 596800 2023-11-29 13:14:27,829 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.50 vs. limit=15.0 2023-11-29 13:14:37,838 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 8.734e+01 9.698e+01 1.076e+02 1.664e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 13:14:48,325 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 7650, loss[loss=0.06682, simple_loss=0.08521, pruned_loss=0.01208, audio_tagging_loss=0.01213, over 14554.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.0884, pruned_loss=0.01178, audio_tagging_loss=0.008458, over 3041197.50 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:14:59,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3978873.3333333335, ans=0.125 2023-11-29 13:15:21,946 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 596850 2023-11-29 13:15:26,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3979006.6666666665, ans=0.125 2023-11-29 13:15:44,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3979073.3333333335, ans=0.125 2023-11-29 13:15:50,316 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 7700, loss[loss=0.07291, simple_loss=0.09605, pruned_loss=0.0144, audio_tagging_loss=0.01048, over 14894.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08953, pruned_loss=0.01207, audio_tagging_loss=0.008438, over 3041650.64 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:16:03,089 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.88 vs. limit=10.0 2023-11-29 13:16:21,142 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.12 vs. limit=15.0 2023-11-29 13:16:24,119 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 596900 2023-11-29 13:16:26,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3979340.0, ans=0.1 2023-11-29 13:16:30,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3979340.0, ans=0.2 2023-11-29 13:16:40,891 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.740e+01 9.160e+01 9.812e+01 1.042e+02 1.331e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 13:16:52,055 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 7750, loss[loss=0.07493, simple_loss=0.1128, pruned_loss=0.01085, audio_tagging_loss=0.007702, over 14975.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08924, pruned_loss=0.01192, audio_tagging_loss=0.00849, over 3035485.38 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:17:25,958 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 596950 2023-11-29 13:17:46,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3979740.0, ans=0.1 2023-11-29 13:17:54,661 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 7800, loss[loss=0.06689, simple_loss=0.0869, pruned_loss=0.01076, audio_tagging_loss=0.01268, over 14563.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08842, pruned_loss=0.01181, audio_tagging_loss=0.008571, over 3038051.90 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:18:06,255 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:18:23,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3979940.0, ans=0.1 2023-11-29 13:18:27,686 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 597000 2023-11-29 13:18:30,433 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2023-11-29 13:18:35,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3980006.6666666665, ans=0.0 2023-11-29 13:18:36,559 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2023-11-29 13:18:46,276 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.796e+01 9.044e+01 9.654e+01 1.045e+02 1.431e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-29 13:18:47,765 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:18:57,699 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 7850, loss[loss=0.07197, simple_loss=0.1088, pruned_loss=0.01128, audio_tagging_loss=0.00628, over 15828.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08862, pruned_loss=0.01179, audio_tagging_loss=0.008528, over 3044237.78 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:19:31,706 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 597050 2023-11-29 13:19:31,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3980273.3333333335, ans=0.1 2023-11-29 13:19:33,133 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:19:59,734 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 7900, loss[loss=0.05042, simple_loss=0.06574, pruned_loss=0.006941, audio_tagging_loss=0.01061, over 15445.00 frames. ], tot_loss[loss=0.06441, simple_loss=0.08851, pruned_loss=0.01153, audio_tagging_loss=0.008628, over 3043633.62 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:20:03,697 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.33 vs. limit=10.0 2023-11-29 13:20:07,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3980473.3333333335, ans=0.2 2023-11-29 13:20:20,817 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.63 vs. limit=15.0 2023-11-29 13:20:30,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3980606.6666666665, ans=0.1 2023-11-29 13:20:32,916 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 597100 2023-11-29 13:20:44,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3980673.3333333335, ans=10.0 2023-11-29 13:20:46,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3980673.3333333335, ans=0.125 2023-11-29 13:20:52,317 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.586e+01 9.084e+01 9.995e+01 1.068e+02 1.283e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-29 13:21:01,666 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 7950, loss[loss=0.07226, simple_loss=0.1047, pruned_loss=0.01412, audio_tagging_loss=0.005797, over 15812.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08905, pruned_loss=0.01157, audio_tagging_loss=0.008691, over 3048939.82 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:21:20,119 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 13:21:35,901 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 597150 2023-11-29 13:21:44,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3981006.6666666665, ans=0.2 2023-11-29 13:21:55,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3981073.3333333335, ans=0.0 2023-11-29 13:22:04,441 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 8000, loss[loss=0.06377, simple_loss=0.0916, pruned_loss=0.01207, audio_tagging_loss=0.005891, over 15758.00 frames. ], tot_loss[loss=0.06399, simple_loss=0.08751, pruned_loss=0.01146, audio_tagging_loss=0.008774, over 3038767.17 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:22:21,224 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.35 vs. limit=15.0 2023-11-29 13:22:27,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3981206.6666666665, ans=0.125 2023-11-29 13:22:27,723 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.39 vs. limit=6.0 2023-11-29 13:22:28,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3981273.3333333335, ans=0.1 2023-11-29 13:22:37,443 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 597200 2023-11-29 13:22:57,942 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.949e+01 8.831e+01 9.452e+01 1.032e+02 1.267e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-29 13:23:01,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3981406.6666666665, ans=0.025 2023-11-29 13:23:06,846 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 8050, loss[loss=0.06469, simple_loss=0.08972, pruned_loss=0.01126, audio_tagging_loss=0.008576, over 15163.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08876, pruned_loss=0.01171, audio_tagging_loss=0.008761, over 3045818.90 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:23:40,608 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 597250 2023-11-29 13:23:42,530 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.17 vs. limit=15.0 2023-11-29 13:24:08,355 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 8100, loss[loss=0.07865, simple_loss=0.1179, pruned_loss=0.01299, audio_tagging_loss=0.006688, over 15700.00 frames. ], tot_loss[loss=0.06425, simple_loss=0.08791, pruned_loss=0.01155, audio_tagging_loss=0.008737, over 3045212.19 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:24:17,030 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-29 13:24:23,267 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:24:37,644 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:24:41,068 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 597300 2023-11-29 13:24:59,881 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.898e+01 9.024e+01 9.571e+01 1.071e+02 1.276e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 13:25:04,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3982073.3333333335, ans=0.1 2023-11-29 13:25:08,621 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 8150, loss[loss=0.05859, simple_loss=0.0659, pruned_loss=0.01417, audio_tagging_loss=0.01146, over 13874.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08823, pruned_loss=0.01158, audio_tagging_loss=0.008588, over 3043737.27 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:25:09,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3982140.0, ans=0.0 2023-11-29 13:25:31,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3982206.6666666665, ans=0.0 2023-11-29 13:25:41,943 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 597350 2023-11-29 13:25:48,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3982340.0, ans=0.2 2023-11-29 13:25:50,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3982340.0, ans=0.125 2023-11-29 13:25:52,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3982340.0, ans=0.125 2023-11-29 13:25:55,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3982340.0, ans=0.125 2023-11-29 13:25:59,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3982406.6666666665, ans=0.125 2023-11-29 13:26:02,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3982406.6666666665, ans=0.0 2023-11-29 13:26:10,390 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 8200, loss[loss=0.07465, simple_loss=0.1087, pruned_loss=0.0154, audio_tagging_loss=0.004911, over 15296.00 frames. ], tot_loss[loss=0.0642, simple_loss=0.08828, pruned_loss=0.01155, audio_tagging_loss=0.008517, over 3051734.26 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:26:13,967 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 13:26:40,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3982606.6666666665, ans=0.0 2023-11-29 13:26:43,628 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 597400 2023-11-29 13:26:52,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3982673.3333333335, ans=0.0 2023-11-29 13:26:59,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3982740.0, ans=0.125 2023-11-29 13:27:03,618 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.841e+01 9.247e+01 9.743e+01 1.041e+02 1.327e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-29 13:27:07,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3982740.0, ans=0.05 2023-11-29 13:27:07,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3982740.0, ans=0.125 2023-11-29 13:27:12,556 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 8250, loss[loss=0.06184, simple_loss=0.08558, pruned_loss=0.009969, audio_tagging_loss=0.009083, over 15449.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08897, pruned_loss=0.01165, audio_tagging_loss=0.008426, over 3050494.95 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:27:14,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3982806.6666666665, ans=0.07 2023-11-29 13:27:18,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3982806.6666666665, ans=0.0 2023-11-29 13:27:45,912 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 597450 2023-11-29 13:27:47,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3982940.0, ans=0.0 2023-11-29 13:27:59,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3983006.6666666665, ans=0.2 2023-11-29 13:28:12,932 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 8300, loss[loss=0.08304, simple_loss=0.1179, pruned_loss=0.017, audio_tagging_loss=0.007098, over 15390.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08906, pruned_loss=0.01181, audio_tagging_loss=0.008393, over 3052094.27 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:28:23,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3983140.0, ans=0.125 2023-11-29 13:28:30,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3983206.6666666665, ans=0.0 2023-11-29 13:28:33,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3983206.6666666665, ans=0.125 2023-11-29 13:28:39,221 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:28:41,094 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-29 13:28:46,620 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 597500 2023-11-29 13:28:54,406 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.24 vs. limit=15.0 2023-11-29 13:29:02,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3983406.6666666665, ans=0.04949747468305833 2023-11-29 13:29:05,738 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.020e+01 9.254e+01 9.916e+01 1.057e+02 1.310e+02, threshold=1.983e+02, percent-clipped=0.0 2023-11-29 13:29:07,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.25 vs. limit=15.0 2023-11-29 13:29:14,579 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 8350, loss[loss=0.06492, simple_loss=0.09011, pruned_loss=0.01034, audio_tagging_loss=0.009526, over 15083.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.0888, pruned_loss=0.01182, audio_tagging_loss=0.008517, over 3043671.71 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:29:36,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.90 vs. limit=15.0 2023-11-29 13:29:47,137 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 597550 2023-11-29 13:29:47,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3983606.6666666665, ans=0.0 2023-11-29 13:30:10,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3983740.0, ans=0.1 2023-11-29 13:30:11,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3983740.0, ans=0.2 2023-11-29 13:30:14,059 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.99 vs. limit=15.0 2023-11-29 13:30:16,379 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 8400, loss[loss=0.06801, simple_loss=0.09595, pruned_loss=0.01214, audio_tagging_loss=0.007901, over 16126.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.0892, pruned_loss=0.01199, audio_tagging_loss=0.008482, over 3047288.60 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:30:34,962 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.36 vs. limit=10.0 2023-11-29 13:30:49,609 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 597600 2023-11-29 13:31:05,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3984073.3333333335, ans=0.125 2023-11-29 13:31:10,343 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 9.074e+01 9.912e+01 1.068e+02 1.273e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-29 13:31:14,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.21 vs. limit=15.0 2023-11-29 13:31:17,388 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 8450, loss[loss=0.0517, simple_loss=0.07173, pruned_loss=0.006448, audio_tagging_loss=0.009391, over 14190.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08971, pruned_loss=0.01201, audio_tagging_loss=0.008484, over 3050043.93 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:31:51,447 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 597650 2023-11-29 13:31:55,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3984340.0, ans=0.2 2023-11-29 13:31:56,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3984340.0, ans=0.125 2023-11-29 13:32:16,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3984406.6666666665, ans=0.0 2023-11-29 13:32:18,948 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 8500, loss[loss=0.04778, simple_loss=0.05791, pruned_loss=0.007656, audio_tagging_loss=0.01117, over 15298.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.09012, pruned_loss=0.01199, audio_tagging_loss=0.008469, over 3052127.17 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:32:32,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3984540.0, ans=0.125 2023-11-29 13:32:52,112 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:32:53,039 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 597700 2023-11-29 13:32:55,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3984673.3333333335, ans=0.125 2023-11-29 13:33:08,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3984740.0, ans=0.0 2023-11-29 13:33:13,722 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.159e+01 8.994e+01 9.750e+01 1.041e+02 1.321e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-29 13:33:21,349 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 8550, loss[loss=0.04833, simple_loss=0.0643, pruned_loss=0.006132, audio_tagging_loss=0.01005, over 14374.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09074, pruned_loss=0.01222, audio_tagging_loss=0.008497, over 3044825.91 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:33:34,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3984873.3333333335, ans=0.0 2023-11-29 13:33:41,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3984873.3333333335, ans=0.125 2023-11-29 13:33:54,588 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 597750 2023-11-29 13:34:12,055 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2023-11-29 13:34:22,917 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 8600, loss[loss=0.07879, simple_loss=0.1134, pruned_loss=0.0172, audio_tagging_loss=0.004866, over 15628.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09037, pruned_loss=0.01214, audio_tagging_loss=0.008481, over 3047282.63 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:34:26,748 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:34:32,485 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=15.0 2023-11-29 13:34:37,941 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:34:41,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3985206.6666666665, ans=0.0 2023-11-29 13:34:41,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3985206.6666666665, ans=0.2 2023-11-29 13:34:42,719 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:34:45,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3985206.6666666665, ans=0.1 2023-11-29 13:34:55,861 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:34:57,486 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 597800 2023-11-29 13:35:10,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3985340.0, ans=0.125 2023-11-29 13:35:18,154 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.750e+01 8.848e+01 9.575e+01 1.022e+02 1.291e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-29 13:35:18,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3985406.6666666665, ans=0.125 2023-11-29 13:35:20,828 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:35:21,113 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2023-11-29 13:35:23,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3985406.6666666665, ans=0.125 2023-11-29 13:35:25,290 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 8650, loss[loss=0.06335, simple_loss=0.08683, pruned_loss=0.01279, audio_tagging_loss=0.007151, over 14846.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08965, pruned_loss=0.01205, audio_tagging_loss=0.008508, over 3043609.04 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:35:26,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3985473.3333333335, ans=0.125 2023-11-29 13:35:40,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3985540.0, ans=0.0 2023-11-29 13:35:58,856 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 597850 2023-11-29 13:36:02,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3985673.3333333335, ans=0.0 2023-11-29 13:36:03,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3985673.3333333335, ans=0.125 2023-11-29 13:36:27,112 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 8700, loss[loss=0.0801, simple_loss=0.1163, pruned_loss=0.01495, audio_tagging_loss=0.007007, over 15575.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08987, pruned_loss=0.01218, audio_tagging_loss=0.008567, over 3040680.37 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:36:31,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3985806.6666666665, ans=0.125 2023-11-29 13:36:44,660 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.75 vs. limit=10.0 2023-11-29 13:36:59,911 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 597900 2023-11-29 13:37:02,695 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.94 vs. limit=15.0 2023-11-29 13:37:13,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3986006.6666666665, ans=0.1 2023-11-29 13:37:21,272 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.738e+01 9.317e+01 9.883e+01 1.072e+02 1.210e+02, threshold=1.977e+02, percent-clipped=0.0 2023-11-29 13:37:28,382 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 8750, loss[loss=0.058, simple_loss=0.07638, pruned_loss=0.01024, audio_tagging_loss=0.009572, over 14753.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09002, pruned_loss=0.01217, audio_tagging_loss=0.008594, over 3040759.93 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:37:41,409 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2023-11-29 13:37:45,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3986206.6666666665, ans=0.2 2023-11-29 13:37:58,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3986273.3333333335, ans=0.125 2023-11-29 13:38:01,096 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 597950 2023-11-29 13:38:16,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3986406.6666666665, ans=0.0 2023-11-29 13:38:29,589 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 8800, loss[loss=0.06502, simple_loss=0.09702, pruned_loss=0.01001, audio_tagging_loss=0.006497, over 15640.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09132, pruned_loss=0.01228, audio_tagging_loss=0.008577, over 3043110.14 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:38:38,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3986473.3333333335, ans=0.0 2023-11-29 13:38:43,878 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=6.0 2023-11-29 13:38:51,247 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.06 vs. limit=15.0 2023-11-29 13:39:02,978 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 598000 2023-11-29 13:39:06,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3986673.3333333335, ans=0.0 2023-11-29 13:39:23,607 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.845e+01 9.465e+01 1.025e+02 1.121e+02 1.304e+02, threshold=2.051e+02, percent-clipped=0.0 2023-11-29 13:39:23,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3986740.0, ans=0.0 2023-11-29 13:39:29,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3986740.0, ans=0.125 2023-11-29 13:39:31,212 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 8850, loss[loss=0.05768, simple_loss=0.07526, pruned_loss=0.01195, audio_tagging_loss=0.008099, over 14395.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09075, pruned_loss=0.01227, audio_tagging_loss=0.008528, over 3039504.71 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:39:31,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3986806.6666666665, ans=0.125 2023-11-29 13:39:41,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=3986806.6666666665, ans=0.02 2023-11-29 13:39:46,122 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 13:40:04,309 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 598050 2023-11-29 13:40:05,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3986940.0, ans=0.0 2023-11-29 13:40:21,714 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.49 vs. limit=22.5 2023-11-29 13:40:29,412 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.34 vs. limit=22.5 2023-11-29 13:40:32,891 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 8900, loss[loss=0.06235, simple_loss=0.08654, pruned_loss=0.009457, audio_tagging_loss=0.009623, over 15042.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09087, pruned_loss=0.01224, audio_tagging_loss=0.00852, over 3036437.54 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:40:33,733 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2023-11-29 13:40:36,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3987140.0, ans=0.04949747468305833 2023-11-29 13:40:48,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3987206.6666666665, ans=0.125 2023-11-29 13:40:49,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3987206.6666666665, ans=0.125 2023-11-29 13:40:53,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3987206.6666666665, ans=0.0 2023-11-29 13:40:55,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3987273.3333333335, ans=0.1 2023-11-29 13:41:03,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3987273.3333333335, ans=0.125 2023-11-29 13:41:05,591 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 598100 2023-11-29 13:41:26,653 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.226e+01 9.187e+01 9.849e+01 1.048e+02 1.202e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 13:41:28,437 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=12.0 2023-11-29 13:41:34,339 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 8950, loss[loss=0.05743, simple_loss=0.08593, pruned_loss=0.006834, audio_tagging_loss=0.007625, over 16238.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.09048, pruned_loss=0.01207, audio_tagging_loss=0.008349, over 3048048.61 frames. ], batch size: 60, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 13:41:35,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3987473.3333333335, ans=0.125 2023-11-29 13:41:51,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3987540.0, ans=0.125 2023-11-29 13:41:58,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3987606.6666666665, ans=0.1 2023-11-29 13:42:05,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3987606.6666666665, ans=0.0 2023-11-29 13:42:07,478 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 598150 2023-11-29 13:42:19,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3987673.3333333335, ans=0.1 2023-11-29 13:42:25,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3987740.0, ans=0.1 2023-11-29 13:42:30,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3987740.0, ans=0.125 2023-11-29 13:42:35,646 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 9000, loss[loss=0.07236, simple_loss=0.0975, pruned_loss=0.01377, audio_tagging_loss=0.009843, over 15365.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.09049, pruned_loss=0.01203, audio_tagging_loss=0.008359, over 3043054.95 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:42:35,649 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-29 13:42:54,477 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.2168, 4.6220, 5.2558, 4.8969], device='cuda:0') 2023-11-29 13:42:59,995 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8570, 4.9671, 5.1365, 4.9444], device='cuda:0') 2023-11-29 13:43:14,448 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4322, 3.8155, 3.0941, 3.8546], device='cuda:0') 2023-11-29 13:43:16,195 INFO [train_asr.py:1267] (0/4) Epoch 50, validation: loss=0.05899, simple_loss=0.05036, pruned_loss=0.005383, audio_tagging_loss=0.02843, over 4681554.00 frames. 2023-11-29 13:43:16,195 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-29 13:43:20,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3987806.6666666665, ans=0.0 2023-11-29 13:43:22,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3987806.6666666665, ans=0.1 2023-11-29 13:43:44,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3987940.0, ans=0.125 2023-11-29 13:43:46,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3987940.0, ans=0.125 2023-11-29 13:43:49,568 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 598200 2023-11-29 13:43:52,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.31 vs. limit=12.0 2023-11-29 13:44:05,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=3988073.3333333335, ans=15.0 2023-11-29 13:44:12,250 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 9.277e+01 9.847e+01 1.044e+02 1.354e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-29 13:44:16,382 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.12 vs. limit=22.5 2023-11-29 13:44:18,134 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 9050, loss[loss=0.08308, simple_loss=0.1168, pruned_loss=0.01861, audio_tagging_loss=0.006094, over 15343.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.09033, pruned_loss=0.01195, audio_tagging_loss=0.008377, over 3044820.03 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:44:20,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3988140.0, ans=0.125 2023-11-29 13:44:26,874 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.12 vs. limit=15.0 2023-11-29 13:44:49,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3988273.3333333335, ans=0.125 2023-11-29 13:44:50,767 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 598250 2023-11-29 13:45:13,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3988406.6666666665, ans=0.2 2023-11-29 13:45:20,160 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 9100, loss[loss=0.05735, simple_loss=0.07518, pruned_loss=0.01237, audio_tagging_loss=0.007394, over 14096.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.09072, pruned_loss=0.01201, audio_tagging_loss=0.008332, over 3045073.20 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:45:34,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3988540.0, ans=0.125 2023-11-29 13:45:51,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3988606.6666666665, ans=0.0 2023-11-29 13:45:54,250 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 598300 2023-11-29 13:46:02,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3988673.3333333335, ans=0.0 2023-11-29 13:46:06,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3988673.3333333335, ans=0.1 2023-11-29 13:46:16,762 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 9.247e+01 9.830e+01 1.081e+02 1.321e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-29 13:46:22,638 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 9150, loss[loss=0.07323, simple_loss=0.1074, pruned_loss=0.01208, audio_tagging_loss=0.007466, over 15113.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.09081, pruned_loss=0.01203, audio_tagging_loss=0.008263, over 3051775.10 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:46:45,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3988873.3333333335, ans=0.07 2023-11-29 13:46:46,708 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:46:56,394 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 598350 2023-11-29 13:47:11,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3989073.3333333335, ans=0.125 2023-11-29 13:47:25,270 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 9200, loss[loss=0.05594, simple_loss=0.07948, pruned_loss=0.007963, audio_tagging_loss=0.008234, over 16002.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.09, pruned_loss=0.01183, audio_tagging_loss=0.008246, over 3058284.06 frames. ], batch size: 60, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 13:47:28,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3989140.0, ans=0.2 2023-11-29 13:47:34,708 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:47:48,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3989273.3333333335, ans=0.125 2023-11-29 13:47:51,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3989273.3333333335, ans=0.04949747468305833 2023-11-29 13:47:57,977 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 598400 2023-11-29 13:47:58,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3989273.3333333335, ans=0.125 2023-11-29 13:48:00,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3989340.0, ans=0.0 2023-11-29 13:48:08,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3989340.0, ans=0.015 2023-11-29 13:48:14,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3989406.6666666665, ans=0.2 2023-11-29 13:48:21,342 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.841e+01 9.563e+01 1.025e+02 1.500e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 13:48:23,382 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.60 vs. limit=15.0 2023-11-29 13:48:25,995 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 9250, loss[loss=0.07325, simple_loss=0.1028, pruned_loss=0.01538, audio_tagging_loss=0.00647, over 15989.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08986, pruned_loss=0.01184, audio_tagging_loss=0.008236, over 3062912.57 frames. ], batch size: 61, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:48:34,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3989473.3333333335, ans=0.125 2023-11-29 13:48:37,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3989473.3333333335, ans=0.09899494936611666 2023-11-29 13:49:00,797 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 598450 2023-11-29 13:49:13,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3989673.3333333335, ans=0.05 2023-11-29 13:49:14,243 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.25 vs. limit=15.0 2023-11-29 13:49:15,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3989740.0, ans=0.1 2023-11-29 13:49:24,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3989740.0, ans=0.125 2023-11-29 13:49:26,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3989740.0, ans=0.07 2023-11-29 13:49:28,984 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 9300, loss[loss=0.06247, simple_loss=0.07974, pruned_loss=0.01107, audio_tagging_loss=0.01154, over 15601.00 frames. ], tot_loss[loss=0.06431, simple_loss=0.08877, pruned_loss=0.01156, audio_tagging_loss=0.008376, over 3061336.43 frames. ], batch size: 60, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:49:36,419 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:49:37,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3989806.6666666665, ans=0.0 2023-11-29 13:49:45,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3989873.3333333335, ans=0.125 2023-11-29 13:49:54,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3989940.0, ans=0.125 2023-11-29 13:50:02,607 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 598500 2023-11-29 13:50:11,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3990006.6666666665, ans=0.125 2023-11-29 13:50:16,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3990006.6666666665, ans=0.125 2023-11-29 13:50:25,503 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.873e+01 9.186e+01 9.880e+01 1.044e+02 1.365e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-29 13:50:30,910 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 9350, loss[loss=0.04649, simple_loss=0.06654, pruned_loss=0.006418, audio_tagging_loss=0.0068, over 14619.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08923, pruned_loss=0.01161, audio_tagging_loss=0.008367, over 3057329.27 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:50:52,582 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:50:57,112 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.20 vs. limit=10.0 2023-11-29 13:51:04,926 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 598550 2023-11-29 13:51:33,510 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 9400, loss[loss=0.07, simple_loss=0.1005, pruned_loss=0.01123, audio_tagging_loss=0.008506, over 15409.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08928, pruned_loss=0.01163, audio_tagging_loss=0.008452, over 3054985.84 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:51:33,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3990473.3333333335, ans=0.125 2023-11-29 13:52:06,965 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 598600 2023-11-29 13:52:09,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3990673.3333333335, ans=0.125 2023-11-29 13:52:22,462 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.66 vs. limit=15.0 2023-11-29 13:52:31,037 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.737e+01 9.140e+01 9.680e+01 1.042e+02 1.691e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-29 13:52:33,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3990740.0, ans=0.125 2023-11-29 13:52:35,724 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 9450, loss[loss=0.06512, simple_loss=0.08874, pruned_loss=0.009451, audio_tagging_loss=0.0113, over 15503.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08901, pruned_loss=0.01138, audio_tagging_loss=0.008578, over 3050192.93 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:52:36,942 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 13:53:05,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3990940.0, ans=0.0 2023-11-29 13:53:08,750 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 598650 2023-11-29 13:53:10,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3990940.0, ans=0.0 2023-11-29 13:53:37,331 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 9500, loss[loss=0.07817, simple_loss=0.1119, pruned_loss=0.01461, audio_tagging_loss=0.007609, over 15184.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08964, pruned_loss=0.01158, audio_tagging_loss=0.008654, over 3047356.80 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:53:37,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3991140.0, ans=0.0 2023-11-29 13:54:10,658 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 598700 2023-11-29 13:54:33,313 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.161e+01 9.773e+01 1.049e+02 1.216e+02, threshold=1.955e+02, percent-clipped=0.0 2023-11-29 13:54:38,480 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 9550, loss[loss=0.07818, simple_loss=0.1118, pruned_loss=0.01559, audio_tagging_loss=0.006674, over 15955.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08964, pruned_loss=0.01159, audio_tagging_loss=0.008702, over 3048381.59 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:55:12,048 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 598750 2023-11-29 13:55:33,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3991740.0, ans=0.125 2023-11-29 13:55:39,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3991806.6666666665, ans=0.95 2023-11-29 13:55:40,331 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 9600, loss[loss=0.05318, simple_loss=0.07506, pruned_loss=0.006307, audio_tagging_loss=0.00934, over 15441.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08931, pruned_loss=0.01168, audio_tagging_loss=0.008749, over 3040396.64 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:55:42,272 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.09 vs. limit=15.0 2023-11-29 13:55:47,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3991806.6666666665, ans=0.125 2023-11-29 13:56:03,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3991940.0, ans=0.0 2023-11-29 13:56:03,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3991940.0, ans=0.1 2023-11-29 13:56:12,522 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 598800 2023-11-29 13:56:37,809 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.152e+01 9.142e+01 9.552e+01 1.023e+02 1.315e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-29 13:56:38,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3992073.3333333335, ans=0.125 2023-11-29 13:56:41,296 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 9650, loss[loss=0.06406, simple_loss=0.09154, pruned_loss=0.00914, audio_tagging_loss=0.009153, over 14646.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08953, pruned_loss=0.01185, audio_tagging_loss=0.00869, over 3039252.84 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:56:42,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3992140.0, ans=10.0 2023-11-29 13:56:55,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3992206.6666666665, ans=0.125 2023-11-29 13:57:04,111 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.56 vs. limit=15.0 2023-11-29 13:57:15,239 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 598850 2023-11-29 13:57:18,109 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.16 vs. limit=12.0 2023-11-29 13:57:33,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3992406.6666666665, ans=0.0 2023-11-29 13:57:42,300 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 9700, loss[loss=0.07115, simple_loss=0.09323, pruned_loss=0.01391, audio_tagging_loss=0.01063, over 15671.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.0891, pruned_loss=0.01189, audio_tagging_loss=0.008602, over 3036807.34 frames. ], batch size: 59, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 13:57:56,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3992540.0, ans=0.0 2023-11-29 13:58:14,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=3992606.6666666665, ans=10.0 2023-11-29 13:58:15,973 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 598900 2023-11-29 13:58:22,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3992673.3333333335, ans=0.1 2023-11-29 13:58:24,236 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.42 vs. limit=22.5 2023-11-29 13:58:26,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3992673.3333333335, ans=0.2 2023-11-29 13:58:27,680 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2023-11-29 13:58:31,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3992740.0, ans=0.125 2023-11-29 13:58:41,705 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.690e+01 9.286e+01 9.993e+01 1.057e+02 1.299e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-29 13:58:44,674 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 9750, loss[loss=0.05047, simple_loss=0.07112, pruned_loss=0.005364, audio_tagging_loss=0.009545, over 15276.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08894, pruned_loss=0.01177, audio_tagging_loss=0.008475, over 3033710.39 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 13:58:44,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3992806.6666666665, ans=0.125 2023-11-29 13:59:06,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3992873.3333333335, ans=0.025 2023-11-29 13:59:16,867 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 598950 2023-11-29 13:59:29,501 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2023-11-29 13:59:35,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3993073.3333333335, ans=0.125 2023-11-29 13:59:44,662 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 9800, loss[loss=0.06487, simple_loss=0.09755, pruned_loss=0.009636, audio_tagging_loss=0.006464, over 14724.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08926, pruned_loss=0.01164, audio_tagging_loss=0.008336, over 3038696.43 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 13:59:48,728 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.77 vs. limit=6.0 2023-11-29 13:59:58,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3993206.6666666665, ans=0.0 2023-11-29 13:59:59,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3993206.6666666665, ans=0.2 2023-11-29 14:00:18,789 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 599000 2023-11-29 14:00:25,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.04 vs. limit=12.0 2023-11-29 14:00:34,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.63 vs. limit=22.5 2023-11-29 14:00:36,710 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:00:43,082 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:00:44,128 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.931e+01 9.016e+01 9.640e+01 1.052e+02 1.257e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-29 14:00:46,490 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 9850, loss[loss=0.07243, simple_loss=0.09322, pruned_loss=0.01777, audio_tagging_loss=0.008047, over 15433.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.09036, pruned_loss=0.01198, audio_tagging_loss=0.008214, over 3042354.56 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:00:53,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3993473.3333333335, ans=0.0 2023-11-29 14:01:13,587 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.87 vs. limit=22.5 2023-11-29 14:01:14,752 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2023-11-29 14:01:17,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3993606.6666666665, ans=0.0 2023-11-29 14:01:20,069 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 599050 2023-11-29 14:01:21,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3993606.6666666665, ans=0.125 2023-11-29 14:01:27,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3993673.3333333335, ans=0.0 2023-11-29 14:01:47,706 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 9900, loss[loss=0.0456, simple_loss=0.06094, pruned_loss=0.005063, audio_tagging_loss=0.01006, over 17259.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.09017, pruned_loss=0.0121, audio_tagging_loss=0.008316, over 3045905.10 frames. ], batch size: 66, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:01:47,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3993806.6666666665, ans=0.125 2023-11-29 14:01:59,682 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.34 vs. limit=15.0 2023-11-29 14:02:20,704 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 599100 2023-11-29 14:02:41,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3994073.3333333335, ans=0.125 2023-11-29 14:02:46,883 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.073e+01 9.233e+01 9.702e+01 1.026e+02 1.352e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 14:02:47,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3994073.3333333335, ans=0.125 2023-11-29 14:02:49,334 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 9950, loss[loss=0.06535, simple_loss=0.08391, pruned_loss=0.01225, audio_tagging_loss=0.01114, over 15501.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.0901, pruned_loss=0.01198, audio_tagging_loss=0.008274, over 3048726.76 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:03:06,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3994206.6666666665, ans=0.0 2023-11-29 14:03:13,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3994273.3333333335, ans=0.125 2023-11-29 14:03:22,216 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 599150 2023-11-29 14:03:22,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3994273.3333333335, ans=0.2 2023-11-29 14:03:41,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3994406.6666666665, ans=0.125 2023-11-29 14:03:51,250 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 10000, loss[loss=0.06621, simple_loss=0.09941, pruned_loss=0.009752, audio_tagging_loss=0.006754, over 15841.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08913, pruned_loss=0.01174, audio_tagging_loss=0.008272, over 3051751.50 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:03:51,692 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2023-11-29 14:04:05,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3994540.0, ans=10.0 2023-11-29 14:04:23,149 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.90 vs. limit=10.0 2023-11-29 14:04:24,896 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 599200 2023-11-29 14:04:32,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3994673.3333333335, ans=0.125 2023-11-29 14:04:47,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3994740.0, ans=0.09899494936611666 2023-11-29 14:04:50,883 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.059e+01 9.244e+01 9.877e+01 1.049e+02 1.463e+02, threshold=1.975e+02, percent-clipped=0.0 2023-11-29 14:04:53,627 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 10050, loss[loss=0.07317, simple_loss=0.1118, pruned_loss=0.01176, audio_tagging_loss=0.005495, over 15400.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08907, pruned_loss=0.01178, audio_tagging_loss=0.0083, over 3060262.83 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:04:55,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3994806.6666666665, ans=0.125 2023-11-29 14:04:59,517 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.65 vs. limit=10.0 2023-11-29 14:05:00,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3994806.6666666665, ans=0.125 2023-11-29 14:05:27,162 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 599250 2023-11-29 14:05:28,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3994940.0, ans=0.0 2023-11-29 14:05:29,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3995006.6666666665, ans=0.125 2023-11-29 14:05:47,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3995073.3333333335, ans=0.0 2023-11-29 14:05:56,130 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 10100, loss[loss=0.05633, simple_loss=0.07755, pruned_loss=0.009926, audio_tagging_loss=0.007626, over 15562.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08967, pruned_loss=0.01191, audio_tagging_loss=0.008361, over 3057627.70 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:06:02,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3995140.0, ans=0.2 2023-11-29 14:06:05,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3995140.0, ans=0.1 2023-11-29 14:06:12,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3995206.6666666665, ans=0.0 2023-11-29 14:06:14,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3995206.6666666665, ans=0.1 2023-11-29 14:06:16,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3995206.6666666665, ans=0.2 2023-11-29 14:06:29,090 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 599300 2023-11-29 14:06:33,630 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=15.0 2023-11-29 14:06:48,891 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:06:51,757 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.95 vs. limit=15.0 2023-11-29 14:06:54,615 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 9.067e+01 9.808e+01 1.052e+02 1.257e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 14:06:57,722 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 10150, loss[loss=0.06772, simple_loss=0.09984, pruned_loss=0.01102, audio_tagging_loss=0.006786, over 15923.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.09039, pruned_loss=0.01194, audio_tagging_loss=0.008386, over 3059896.03 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:07:11,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3995540.0, ans=0.125 2023-11-29 14:07:28,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3995606.6666666665, ans=0.0 2023-11-29 14:07:29,318 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:07:31,284 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 599350 2023-11-29 14:07:32,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3995606.6666666665, ans=0.1 2023-11-29 14:07:45,408 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=15.0 2023-11-29 14:07:50,371 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.50 vs. limit=5.0 2023-11-29 14:07:58,623 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 10200, loss[loss=0.07582, simple_loss=0.1153, pruned_loss=0.01108, audio_tagging_loss=0.007105, over 16006.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08989, pruned_loss=0.01181, audio_tagging_loss=0.008448, over 3059119.96 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:08:11,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3995873.3333333335, ans=0.125 2023-11-29 14:08:21,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3995873.3333333335, ans=0.2 2023-11-29 14:08:25,152 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:08:32,997 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 599400 2023-11-29 14:08:33,627 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.21 vs. limit=10.0 2023-11-29 14:08:36,283 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.71 vs. limit=15.0 2023-11-29 14:08:57,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3996073.3333333335, ans=0.125 2023-11-29 14:08:59,159 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.884e+01 9.205e+01 9.737e+01 1.021e+02 2.393e+02, threshold=1.947e+02, percent-clipped=1.0 2023-11-29 14:09:01,503 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 10250, loss[loss=0.08024, simple_loss=0.1164, pruned_loss=0.01461, audio_tagging_loss=0.00742, over 14649.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.09027, pruned_loss=0.01196, audio_tagging_loss=0.008508, over 3061697.63 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:09:06,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3996140.0, ans=0.125 2023-11-29 14:09:33,788 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 599450 2023-11-29 14:09:35,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3996273.3333333335, ans=0.125 2023-11-29 14:09:53,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3996406.6666666665, ans=0.1 2023-11-29 14:09:58,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=22.5 2023-11-29 14:10:03,469 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 10300, loss[loss=0.05257, simple_loss=0.06817, pruned_loss=0.008722, audio_tagging_loss=0.009769, over 15078.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08978, pruned_loss=0.0118, audio_tagging_loss=0.008526, over 3055843.06 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:10:15,868 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.45 vs. limit=6.0 2023-11-29 14:10:38,046 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 599500 2023-11-29 14:10:47,192 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=12.0 2023-11-29 14:11:03,602 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.634e+01 9.141e+01 9.728e+01 1.059e+02 1.776e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-29 14:11:06,026 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 10350, loss[loss=0.07051, simple_loss=0.0954, pruned_loss=0.01341, audio_tagging_loss=0.009405, over 14794.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08968, pruned_loss=0.01182, audio_tagging_loss=0.008667, over 3054637.65 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:11:06,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3996806.6666666665, ans=0.125 2023-11-29 14:11:39,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3996940.0, ans=0.125 2023-11-29 14:11:40,570 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 599550 2023-11-29 14:11:42,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3997006.6666666665, ans=0.125 2023-11-29 14:11:57,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3997073.3333333335, ans=0.0 2023-11-29 14:12:08,470 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 10400, loss[loss=0.04954, simple_loss=0.06991, pruned_loss=0.00494, audio_tagging_loss=0.009643, over 14769.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08907, pruned_loss=0.01167, audio_tagging_loss=0.008689, over 3045542.47 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 14:12:08,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3997140.0, ans=0.1 2023-11-29 14:12:25,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3997206.6666666665, ans=0.2 2023-11-29 14:12:42,236 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 599600 2023-11-29 14:13:07,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3997406.6666666665, ans=0.1 2023-11-29 14:13:08,426 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 9.101e+01 9.809e+01 1.030e+02 1.340e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 14:13:10,938 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 10450, loss[loss=0.06007, simple_loss=0.07797, pruned_loss=0.01176, audio_tagging_loss=0.009327, over 14164.00 frames. ], tot_loss[loss=0.06434, simple_loss=0.08823, pruned_loss=0.01154, audio_tagging_loss=0.008688, over 3049447.96 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 14:13:11,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3997473.3333333335, ans=0.0 2023-11-29 14:13:21,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3997473.3333333335, ans=0.0 2023-11-29 14:13:42,916 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.83 vs. limit=15.0 2023-11-29 14:13:44,775 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 599650 2023-11-29 14:13:47,531 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.52 vs. limit=15.0 2023-11-29 14:13:56,108 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.01 vs. limit=22.5 2023-11-29 14:13:59,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3997740.0, ans=0.2 2023-11-29 14:14:10,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3997740.0, ans=0.125 2023-11-29 14:14:12,925 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 10500, loss[loss=0.07596, simple_loss=0.1147, pruned_loss=0.01275, audio_tagging_loss=0.005837, over 16438.00 frames. ], tot_loss[loss=0.06418, simple_loss=0.08796, pruned_loss=0.01161, audio_tagging_loss=0.008593, over 3048746.85 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:14:14,698 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.21 vs. limit=15.0 2023-11-29 14:14:16,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3997806.6666666665, ans=0.5 2023-11-29 14:14:24,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3997873.3333333335, ans=0.125 2023-11-29 14:14:45,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3997940.0, ans=0.1 2023-11-29 14:14:45,983 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 599700 2023-11-29 14:14:53,783 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=12.0 2023-11-29 14:14:54,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3998006.6666666665, ans=0.125 2023-11-29 14:15:01,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3998073.3333333335, ans=0.125 2023-11-29 14:15:14,116 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.802e+01 9.178e+01 9.890e+01 1.072e+02 1.434e+02, threshold=1.978e+02, percent-clipped=0.0 2023-11-29 14:15:15,327 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 10550, loss[loss=0.05987, simple_loss=0.09221, pruned_loss=0.007358, audio_tagging_loss=0.006412, over 17041.00 frames. ], tot_loss[loss=0.06436, simple_loss=0.08843, pruned_loss=0.01163, audio_tagging_loss=0.008512, over 3048829.40 frames. ], batch size: 64, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:15:20,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3998140.0, ans=0.1 2023-11-29 14:15:43,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3998273.3333333335, ans=0.125 2023-11-29 14:15:48,343 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 599750 2023-11-29 14:16:01,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=3998340.0, ans=22.5 2023-11-29 14:16:01,790 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=15.0 2023-11-29 14:16:06,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3998406.6666666665, ans=0.125 2023-11-29 14:16:16,388 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 10600, loss[loss=0.06115, simple_loss=0.08728, pruned_loss=0.01181, audio_tagging_loss=0.005705, over 16315.00 frames. ], tot_loss[loss=0.0642, simple_loss=0.08852, pruned_loss=0.01156, audio_tagging_loss=0.008386, over 3050585.81 frames. ], batch size: 60, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:16:16,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3998473.3333333335, ans=0.1 2023-11-29 14:16:20,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3998473.3333333335, ans=0.2 2023-11-29 14:16:28,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3998540.0, ans=0.0 2023-11-29 14:16:32,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3998540.0, ans=0.0 2023-11-29 14:16:50,400 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 599800 2023-11-29 14:16:57,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3998673.3333333335, ans=0.125 2023-11-29 14:17:02,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3998673.3333333335, ans=0.0 2023-11-29 14:17:10,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3998740.0, ans=0.04949747468305833 2023-11-29 14:17:17,859 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.938e+01 9.773e+01 1.042e+02 1.293e+02, threshold=1.955e+02, percent-clipped=0.0 2023-11-29 14:17:19,083 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 10650, loss[loss=0.05315, simple_loss=0.07335, pruned_loss=0.008867, audio_tagging_loss=0.007611, over 14374.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08882, pruned_loss=0.01174, audio_tagging_loss=0.008441, over 3046756.67 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:17:23,300 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.52 vs. limit=12.0 2023-11-29 14:17:51,888 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 599850 2023-11-29 14:17:59,095 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=22.5 2023-11-29 14:18:20,968 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 10700, loss[loss=0.07591, simple_loss=0.1146, pruned_loss=0.01347, audio_tagging_loss=0.005155, over 14619.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08894, pruned_loss=0.01173, audio_tagging_loss=0.00833, over 3053340.19 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:18:22,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3999140.0, ans=0.125 2023-11-29 14:18:28,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3999140.0, ans=0.0 2023-11-29 14:18:40,481 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2023-11-29 14:18:54,367 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 599900 2023-11-29 14:19:10,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3999406.6666666665, ans=0.125 2023-11-29 14:19:14,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3999406.6666666665, ans=0.0 2023-11-29 14:19:14,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3999406.6666666665, ans=0.125 2023-11-29 14:19:20,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3999406.6666666665, ans=0.0 2023-11-29 14:19:21,108 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.715e+01 9.059e+01 9.669e+01 1.032e+02 1.449e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 14:19:21,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3999473.3333333335, ans=0.125 2023-11-29 14:19:21,746 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.63 vs. limit=15.0 2023-11-29 14:19:22,379 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 10750, loss[loss=0.06713, simple_loss=0.09111, pruned_loss=0.01484, audio_tagging_loss=0.006734, over 15050.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.08902, pruned_loss=0.01162, audio_tagging_loss=0.008258, over 3056309.86 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:19:27,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3999473.3333333335, ans=0.0 2023-11-29 14:19:43,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3999540.0, ans=0.2 2023-11-29 14:19:52,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3999606.6666666665, ans=0.2 2023-11-29 14:19:56,690 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 599950 2023-11-29 14:20:11,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3999740.0, ans=0.1 2023-11-29 14:20:23,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3999806.6666666665, ans=0.125 2023-11-29 14:20:23,927 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 10800, loss[loss=0.06397, simple_loss=0.0906, pruned_loss=0.01109, audio_tagging_loss=0.007573, over 14596.00 frames. ], tot_loss[loss=0.0644, simple_loss=0.08909, pruned_loss=0.01162, audio_tagging_loss=0.008237, over 3052535.95 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 14:20:26,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3999806.6666666665, ans=0.0 2023-11-29 14:20:48,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3999940.0, ans=0.0 2023-11-29 14:20:57,317 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 600000 2023-11-29 14:20:57,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3999940.0, ans=0.125 2023-11-29 14:20:58,762 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-600000.pt 2023-11-29 14:21:10,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4000006.6666666665, ans=0.125 2023-11-29 14:21:11,036 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2023-11-29 14:21:15,631 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.95 vs. limit=22.5 2023-11-29 14:21:28,902 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 9.037e+01 9.839e+01 1.048e+02 1.354e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-29 14:21:28,942 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 10850, loss[loss=0.06496, simple_loss=0.09492, pruned_loss=0.00989, audio_tagging_loss=0.007606, over 15236.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08892, pruned_loss=0.0116, audio_tagging_loss=0.008219, over 3052755.45 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:21:53,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4000273.3333333335, ans=0.0 2023-11-29 14:22:01,514 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.46 vs. limit=10.0 2023-11-29 14:22:02,298 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 600050 2023-11-29 14:22:07,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4000340.0, ans=0.1 2023-11-29 14:22:10,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4000340.0, ans=0.0 2023-11-29 14:22:22,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4000406.6666666665, ans=0.2 2023-11-29 14:22:30,831 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 10900, loss[loss=0.06294, simple_loss=0.08404, pruned_loss=0.01033, audio_tagging_loss=0.01059, over 15050.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08932, pruned_loss=0.0118, audio_tagging_loss=0.008283, over 3047253.76 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:22:30,898 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:22:36,091 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2023-11-29 14:22:43,860 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.22 vs. limit=15.0 2023-11-29 14:22:59,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4000606.6666666665, ans=0.125 2023-11-29 14:23:03,983 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 600100 2023-11-29 14:23:07,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4000673.3333333335, ans=0.0 2023-11-29 14:23:32,235 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.967e+01 9.295e+01 9.770e+01 1.037e+02 1.464e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 14:23:32,275 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 10950, loss[loss=0.07243, simple_loss=0.0904, pruned_loss=0.01689, audio_tagging_loss=0.01034, over 16377.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08905, pruned_loss=0.01185, audio_tagging_loss=0.008337, over 3042973.01 frames. ], batch size: 62, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:23:40,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4000806.6666666665, ans=0.2 2023-11-29 14:23:55,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4000873.3333333335, ans=0.125 2023-11-29 14:24:05,752 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 600150 2023-11-29 14:24:20,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4001073.3333333335, ans=0.0 2023-11-29 14:24:27,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4001073.3333333335, ans=0.1 2023-11-29 14:24:34,454 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 11000, loss[loss=0.05497, simple_loss=0.06827, pruned_loss=0.009481, audio_tagging_loss=0.01135, over 15848.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08944, pruned_loss=0.01199, audio_tagging_loss=0.008372, over 3039739.07 frames. ], batch size: 59, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:24:48,138 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:25:07,587 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 600200 2023-11-29 14:25:21,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4001340.0, ans=0.1 2023-11-29 14:25:33,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4001406.6666666665, ans=0.125 2023-11-29 14:25:34,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4001406.6666666665, ans=0.0 2023-11-29 14:25:35,979 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.02 vs. limit=15.0 2023-11-29 14:25:36,586 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.589e+01 8.890e+01 9.536e+01 1.018e+02 1.298e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-29 14:25:36,617 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 11050, loss[loss=0.05695, simple_loss=0.07485, pruned_loss=0.007513, audio_tagging_loss=0.01201, over 15577.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08854, pruned_loss=0.01175, audio_tagging_loss=0.008536, over 3041753.33 frames. ], batch size: 60, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:25:42,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4001473.3333333335, ans=0.125 2023-11-29 14:25:45,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4001473.3333333335, ans=0.125 2023-11-29 14:25:48,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4001540.0, ans=0.0 2023-11-29 14:25:48,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4001540.0, ans=0.0 2023-11-29 14:25:56,693 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-11-29 14:25:57,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4001540.0, ans=0.2 2023-11-29 14:26:07,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4001606.6666666665, ans=0.0 2023-11-29 14:26:09,890 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 600250 2023-11-29 14:26:14,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4001673.3333333335, ans=0.09899494936611666 2023-11-29 14:26:16,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4001673.3333333335, ans=0.1 2023-11-29 14:26:38,318 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 11100, loss[loss=0.04847, simple_loss=0.05745, pruned_loss=0.008251, audio_tagging_loss=0.01149, over 14386.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08917, pruned_loss=0.01194, audio_tagging_loss=0.008644, over 3049206.40 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:27:11,754 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 600300 2023-11-29 14:27:40,197 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 9.265e+01 9.785e+01 1.064e+02 1.288e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-29 14:27:40,230 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 11150, loss[loss=0.05713, simple_loss=0.07917, pruned_loss=0.01058, audio_tagging_loss=0.006962, over 14864.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08919, pruned_loss=0.01176, audio_tagging_loss=0.008635, over 3051396.90 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:27:56,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4002206.6666666665, ans=0.0 2023-11-29 14:28:13,824 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 600350 2023-11-29 14:28:24,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4002340.0, ans=0.04949747468305833 2023-11-29 14:28:38,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4002406.6666666665, ans=0.0 2023-11-29 14:28:41,725 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 11200, loss[loss=0.05166, simple_loss=0.06356, pruned_loss=0.009158, audio_tagging_loss=0.01072, over 15369.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08877, pruned_loss=0.0118, audio_tagging_loss=0.00872, over 3050240.51 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 14:28:43,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4002473.3333333335, ans=0.1 2023-11-29 14:28:47,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4002473.3333333335, ans=0.125 2023-11-29 14:28:48,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4002473.3333333335, ans=0.125 2023-11-29 14:29:14,809 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 600400 2023-11-29 14:29:21,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4002673.3333333335, ans=0.125 2023-11-29 14:29:34,377 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.38 vs. limit=12.0 2023-11-29 14:29:38,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4002740.0, ans=0.125 2023-11-29 14:29:43,052 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.595e+01 9.140e+01 9.616e+01 1.036e+02 1.306e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-29 14:29:43,084 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 11250, loss[loss=0.08099, simple_loss=0.1127, pruned_loss=0.0165, audio_tagging_loss=0.008162, over 14409.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08807, pruned_loss=0.01177, audio_tagging_loss=0.008785, over 3042379.71 frames. ], batch size: 53, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 14:29:48,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4002806.6666666665, ans=0.1 2023-11-29 14:29:56,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4002873.3333333335, ans=0.1 2023-11-29 14:30:09,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4002940.0, ans=0.125 2023-11-29 14:30:15,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4002940.0, ans=0.125 2023-11-29 14:30:17,315 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 600450 2023-11-29 14:30:22,381 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2023-11-29 14:30:31,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=4003073.3333333335, ans=6.0 2023-11-29 14:30:39,692 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.36 vs. limit=15.0 2023-11-29 14:30:44,776 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 11300, loss[loss=0.04832, simple_loss=0.06431, pruned_loss=0.008041, audio_tagging_loss=0.008122, over 13988.00 frames. ], tot_loss[loss=0.06432, simple_loss=0.08793, pruned_loss=0.01176, audio_tagging_loss=0.008594, over 3044616.29 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:30:56,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4003206.6666666665, ans=0.125 2023-11-29 14:31:18,248 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 600500 2023-11-29 14:31:39,782 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.74 vs. limit=22.5 2023-11-29 14:31:46,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.58 vs. limit=12.0 2023-11-29 14:31:46,836 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 11350, loss[loss=0.05492, simple_loss=0.07352, pruned_loss=0.006354, audio_tagging_loss=0.0118, over 16040.00 frames. ], tot_loss[loss=0.0644, simple_loss=0.08828, pruned_loss=0.0118, audio_tagging_loss=0.008462, over 3042616.39 frames. ], batch size: 62, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:31:49,747 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.915e+01 9.251e+01 9.881e+01 1.050e+02 2.034e+02, threshold=1.976e+02, percent-clipped=1.0 2023-11-29 14:31:51,503 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=22.5 2023-11-29 14:32:08,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2023-11-29 14:32:18,237 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.50 vs. limit=22.5 2023-11-29 14:32:18,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4003606.6666666665, ans=0.125 2023-11-29 14:32:19,814 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 600550 2023-11-29 14:32:26,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4003673.3333333335, ans=0.125 2023-11-29 14:32:26,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4003673.3333333335, ans=0.125 2023-11-29 14:32:38,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4003740.0, ans=0.0 2023-11-29 14:32:45,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4003740.0, ans=0.0 2023-11-29 14:32:48,395 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 11400, loss[loss=0.07723, simple_loss=0.102, pruned_loss=0.01689, audio_tagging_loss=0.009334, over 15474.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08899, pruned_loss=0.01181, audio_tagging_loss=0.008434, over 3047211.68 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:33:11,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4003873.3333333335, ans=0.5 2023-11-29 14:33:19,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4003940.0, ans=0.0 2023-11-29 14:33:22,108 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 600600 2023-11-29 14:33:25,305 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.13 vs. limit=15.0 2023-11-29 14:33:47,055 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2023-11-29 14:33:49,937 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 11450, loss[loss=0.06314, simple_loss=0.08048, pruned_loss=0.01564, audio_tagging_loss=0.007265, over 14405.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08852, pruned_loss=0.01185, audio_tagging_loss=0.008457, over 3051068.55 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:33:52,185 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.736e+01 9.290e+01 9.810e+01 1.057e+02 1.472e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 14:34:09,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.79 vs. limit=10.0 2023-11-29 14:34:24,154 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 600650 2023-11-29 14:34:29,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4004340.0, ans=0.125 2023-11-29 14:34:53,813 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 11500, loss[loss=0.0739, simple_loss=0.1042, pruned_loss=0.01533, audio_tagging_loss=0.006442, over 15094.00 frames. ], tot_loss[loss=0.06432, simple_loss=0.08807, pruned_loss=0.01184, audio_tagging_loss=0.008445, over 3049897.15 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:34:56,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4004473.3333333335, ans=0.0 2023-11-29 14:34:56,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4004473.3333333335, ans=0.0 2023-11-29 14:35:17,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4004606.6666666665, ans=0.1 2023-11-29 14:35:17,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4004606.6666666665, ans=0.125 2023-11-29 14:35:26,717 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 600700 2023-11-29 14:35:29,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4004673.3333333335, ans=0.125 2023-11-29 14:35:29,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.57 vs. limit=15.0 2023-11-29 14:35:55,433 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 11550, loss[loss=0.07351, simple_loss=0.1008, pruned_loss=0.0156, audio_tagging_loss=0.007498, over 15557.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08866, pruned_loss=0.01184, audio_tagging_loss=0.008406, over 3055131.62 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:35:57,769 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 9.007e+01 9.636e+01 1.040e+02 1.609e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-29 14:35:59,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4004806.6666666665, ans=0.0 2023-11-29 14:36:17,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4004873.3333333335, ans=0.035 2023-11-29 14:36:26,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=12.0 2023-11-29 14:36:28,721 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 600750 2023-11-29 14:36:34,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4005006.6666666665, ans=0.125 2023-11-29 14:36:34,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4005006.6666666665, ans=0.95 2023-11-29 14:36:36,878 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:36:45,122 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=12.0 2023-11-29 14:36:52,648 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=15.0 2023-11-29 14:36:55,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4005140.0, ans=0.125 2023-11-29 14:36:56,771 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 11600, loss[loss=0.07161, simple_loss=0.1026, pruned_loss=0.01323, audio_tagging_loss=0.007081, over 15559.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08977, pruned_loss=0.01196, audio_tagging_loss=0.008356, over 3060800.23 frames. ], batch size: 59, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:37:14,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2023-11-29 14:37:25,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4005273.3333333335, ans=0.125 2023-11-29 14:37:30,262 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 600800 2023-11-29 14:37:43,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4005340.0, ans=0.125 2023-11-29 14:37:50,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4005406.6666666665, ans=0.125 2023-11-29 14:37:58,597 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 11650, loss[loss=0.08343, simple_loss=0.1197, pruned_loss=0.01796, audio_tagging_loss=0.005629, over 15112.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08961, pruned_loss=0.01197, audio_tagging_loss=0.008374, over 3063978.02 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:37:58,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4005473.3333333335, ans=0.5 2023-11-29 14:38:00,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=15.0 2023-11-29 14:38:00,916 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.952e+01 9.263e+01 9.866e+01 1.051e+02 2.462e+02, threshold=1.973e+02, percent-clipped=1.0 2023-11-29 14:38:04,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4005473.3333333335, ans=0.2 2023-11-29 14:38:06,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4005473.3333333335, ans=0.1 2023-11-29 14:38:07,985 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.44 vs. limit=15.0 2023-11-29 14:38:08,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4005473.3333333335, ans=0.1 2023-11-29 14:38:12,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4005540.0, ans=0.1 2023-11-29 14:38:17,909 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.46 vs. limit=15.0 2023-11-29 14:38:32,238 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 600850 2023-11-29 14:38:34,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4005673.3333333335, ans=0.125 2023-11-29 14:38:36,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4005673.3333333335, ans=0.125 2023-11-29 14:38:59,904 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 11700, loss[loss=0.07478, simple_loss=0.1067, pruned_loss=0.0127, audio_tagging_loss=0.008705, over 14734.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08889, pruned_loss=0.01179, audio_tagging_loss=0.008406, over 3058573.39 frames. ], batch size: 53, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:39:01,896 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2023-11-29 14:39:08,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4005806.6666666665, ans=0.125 2023-11-29 14:39:12,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4005873.3333333335, ans=0.2 2023-11-29 14:39:25,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4005940.0, ans=0.125 2023-11-29 14:39:33,602 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 600900 2023-11-29 14:39:34,125 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.24 vs. limit=10.0 2023-11-29 14:39:50,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4006073.3333333335, ans=0.1 2023-11-29 14:39:55,270 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.86 vs. limit=10.0 2023-11-29 14:40:00,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2023-11-29 14:40:02,100 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 11750, loss[loss=0.05837, simple_loss=0.07841, pruned_loss=0.01091, audio_tagging_loss=0.008255, over 16930.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08843, pruned_loss=0.01174, audio_tagging_loss=0.008477, over 3054408.91 frames. ], batch size: 63, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:40:04,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4006140.0, ans=0.5 2023-11-29 14:40:05,482 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 9.067e+01 9.619e+01 1.045e+02 1.766e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 14:40:19,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4006206.6666666665, ans=0.1 2023-11-29 14:40:23,484 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.32 vs. limit=12.0 2023-11-29 14:40:34,744 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 600950 2023-11-29 14:40:44,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4006340.0, ans=0.025 2023-11-29 14:41:02,857 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 11800, loss[loss=0.06976, simple_loss=0.1011, pruned_loss=0.01291, audio_tagging_loss=0.006322, over 15103.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08928, pruned_loss=0.01184, audio_tagging_loss=0.008479, over 3056617.51 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:41:04,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4006473.3333333335, ans=0.0 2023-11-29 14:41:13,411 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.22 vs. limit=15.0 2023-11-29 14:41:17,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4006540.0, ans=0.125 2023-11-29 14:41:29,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4006606.6666666665, ans=0.125 2023-11-29 14:41:33,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4006606.6666666665, ans=0.125 2023-11-29 14:41:35,904 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 601000 2023-11-29 14:41:40,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4006673.3333333335, ans=0.1 2023-11-29 14:42:02,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4006740.0, ans=0.0 2023-11-29 14:42:04,201 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 11850, loss[loss=0.06251, simple_loss=0.08293, pruned_loss=0.01133, audio_tagging_loss=0.009714, over 15445.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08963, pruned_loss=0.01196, audio_tagging_loss=0.008542, over 3053100.25 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:42:07,762 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.808e+01 9.068e+01 9.599e+01 1.030e+02 1.301e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 14:42:17,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4006873.3333333335, ans=0.015 2023-11-29 14:42:21,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4006873.3333333335, ans=0.0 2023-11-29 14:42:32,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4006940.0, ans=0.0 2023-11-29 14:42:37,814 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 601050 2023-11-29 14:43:05,830 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 11900, loss[loss=0.06459, simple_loss=0.09359, pruned_loss=0.01037, audio_tagging_loss=0.007418, over 14559.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.0896, pruned_loss=0.01198, audio_tagging_loss=0.008694, over 3051829.69 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:43:13,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4007140.0, ans=0.125 2023-11-29 14:43:20,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4007206.6666666665, ans=0.125 2023-11-29 14:43:20,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4007206.6666666665, ans=0.125 2023-11-29 14:43:38,801 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.29 vs. limit=22.5 2023-11-29 14:43:39,235 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 601100 2023-11-29 14:43:43,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4007340.0, ans=0.125 2023-11-29 14:43:47,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4007340.0, ans=0.1 2023-11-29 14:43:53,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4007406.6666666665, ans=0.1 2023-11-29 14:43:56,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4007406.6666666665, ans=0.1 2023-11-29 14:44:07,725 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 11950, loss[loss=0.06084, simple_loss=0.08328, pruned_loss=0.009993, audio_tagging_loss=0.009212, over 14813.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08949, pruned_loss=0.01195, audio_tagging_loss=0.008634, over 3051918.74 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:44:11,318 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.065e+01 8.998e+01 9.725e+01 1.040e+02 1.716e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-29 14:44:20,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4007540.0, ans=0.1 2023-11-29 14:44:21,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4007540.0, ans=0.2 2023-11-29 14:44:35,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4007606.6666666665, ans=0.2 2023-11-29 14:44:37,428 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:44:38,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.62 vs. limit=22.5 2023-11-29 14:44:40,731 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 601150 2023-11-29 14:44:49,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4007673.3333333335, ans=0.1 2023-11-29 14:45:02,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4007740.0, ans=0.0 2023-11-29 14:45:07,592 INFO [train_asr.py:1235] (0/4) Epoch 50, batch 12000, loss[loss=0.08252, simple_loss=0.1164, pruned_loss=0.01832, audio_tagging_loss=0.005976, over 16279.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.09032, pruned_loss=0.01199, audio_tagging_loss=0.008644, over 3054211.68 frames. ], batch size: 59, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:45:07,595 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-29 14:45:37,286 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.2934, 4.0332, 3.7610, 3.2252], device='cuda:0') 2023-11-29 14:45:47,786 INFO [train_asr.py:1267] (0/4) Epoch 50, validation: loss=0.05813, simple_loss=0.05044, pruned_loss=0.005399, audio_tagging_loss=0.02752, over 4681554.00 frames. 2023-11-29 14:45:47,787 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-29 14:46:17,056 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-50.pt 2023-11-29 14:46:34,489 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 0, loss[loss=0.06946, simple_loss=0.08239, pruned_loss=0.007005, audio_tagging_loss=0.02126, over 14633.00 frames. ], tot_loss[loss=0.06946, simple_loss=0.08239, pruned_loss=0.007005, audio_tagging_loss=0.02126, over 14633.00 frames. ], batch size: 55, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:46:34,499 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-29 14:46:56,215 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4775, 3.9033, 4.3648, 3.4908], device='cuda:0') 2023-11-29 14:47:11,094 INFO [train_asr.py:1267] (0/4) Epoch 51, validation: loss=0.05803, simple_loss=0.05046, pruned_loss=0.005398, audio_tagging_loss=0.02741, over 4681554.00 frames. 2023-11-29 14:47:11,095 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-29 14:47:13,532 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 601200 2023-11-29 14:47:45,161 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.844e+01 9.507e+01 9.981e+01 1.081e+02 1.521e+02, threshold=1.996e+02, percent-clipped=0.0 2023-11-29 14:47:53,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.31 vs. limit=15.0 2023-11-29 14:47:58,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4008173.3333333335, ans=0.125 2023-11-29 14:48:13,505 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 50, loss[loss=0.07962, simple_loss=0.1064, pruned_loss=0.01462, audio_tagging_loss=0.01178, over 14956.00 frames. ], tot_loss[loss=0.07128, simple_loss=0.0852, pruned_loss=0.01158, audio_tagging_loss=0.0171, over 687100.60 frames. ], batch size: 55, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:48:16,020 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 601250 2023-11-29 14:48:18,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4008306.6666666665, ans=0.5 2023-11-29 14:49:07,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4008573.3333333335, ans=0.125 2023-11-29 14:49:15,451 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 100, loss[loss=0.06952, simple_loss=0.09066, pruned_loss=0.01011, audio_tagging_loss=0.01409, over 15204.00 frames. ], tot_loss[loss=0.0717, simple_loss=0.08814, pruned_loss=0.01164, audio_tagging_loss=0.01599, over 1210259.95 frames. ], batch size: 57, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:49:16,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4008640.0, ans=0.1 2023-11-29 14:49:17,854 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 601300 2023-11-29 14:49:45,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4008773.3333333335, ans=0.2 2023-11-29 14:49:51,223 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.732e+01 9.896e+01 1.042e+02 1.115e+02 1.364e+02, threshold=2.085e+02, percent-clipped=0.0 2023-11-29 14:49:53,223 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2023-11-29 14:49:55,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4008840.0, ans=0.125 2023-11-29 14:49:59,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4008840.0, ans=0.125 2023-11-29 14:50:00,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4008840.0, ans=0.1 2023-11-29 14:50:04,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4008906.6666666665, ans=0.125 2023-11-29 14:50:17,048 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 150, loss[loss=0.08179, simple_loss=0.1118, pruned_loss=0.01367, audio_tagging_loss=0.0122, over 15623.00 frames. ], tot_loss[loss=0.07037, simple_loss=0.08908, pruned_loss=0.01156, audio_tagging_loss=0.01427, over 1619187.96 frames. ], batch size: 55, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:50:19,570 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 601350 2023-11-29 14:50:28,997 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.96 vs. limit=10.0 2023-11-29 14:51:11,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4009240.0, ans=0.0 2023-11-29 14:51:16,311 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.93 vs. limit=12.0 2023-11-29 14:51:17,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4009240.0, ans=0.125 2023-11-29 14:51:19,946 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 200, loss[loss=0.06302, simple_loss=0.08456, pruned_loss=0.01125, audio_tagging_loss=0.009493, over 15722.00 frames. ], tot_loss[loss=0.06825, simple_loss=0.08833, pruned_loss=0.01148, audio_tagging_loss=0.01261, over 1940270.35 frames. ], batch size: 60, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:51:22,372 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 601400 2023-11-29 14:51:27,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4009306.6666666665, ans=0.2 2023-11-29 14:51:55,781 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.803e+01 9.139e+01 9.906e+01 1.061e+02 1.460e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-29 14:52:02,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4009506.6666666665, ans=0.1 2023-11-29 14:52:21,865 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 250, loss[loss=0.06445, simple_loss=0.09104, pruned_loss=0.01295, audio_tagging_loss=0.005982, over 14957.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.0886, pruned_loss=0.01157, audio_tagging_loss=0.01135, over 2181056.20 frames. ], batch size: 56, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:52:24,318 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 601450 2023-11-29 14:52:42,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=15.0 2023-11-29 14:52:45,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4009773.3333333335, ans=0.025 2023-11-29 14:52:48,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4009773.3333333335, ans=0.125 2023-11-29 14:52:48,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4009773.3333333335, ans=0.025 2023-11-29 14:53:02,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4009840.0, ans=0.0 2023-11-29 14:53:23,652 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 300, loss[loss=0.06312, simple_loss=0.08886, pruned_loss=0.01106, audio_tagging_loss=0.007632, over 15196.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.08903, pruned_loss=0.01155, audio_tagging_loss=0.01044, over 2372492.47 frames. ], batch size: 57, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:53:26,717 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 601500 2023-11-29 14:53:40,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4010040.0, ans=0.1 2023-11-29 14:53:40,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4010040.0, ans=0.125 2023-11-29 14:53:56,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4010106.6666666665, ans=0.0 2023-11-29 14:53:59,443 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.022e+01 9.446e+01 1.010e+02 1.084e+02 1.415e+02, threshold=2.020e+02, percent-clipped=0.0 2023-11-29 14:54:26,222 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 350, loss[loss=0.06551, simple_loss=0.08383, pruned_loss=0.013, audio_tagging_loss=0.01059, over 15007.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.08981, pruned_loss=0.01176, audio_tagging_loss=0.009883, over 2523865.37 frames. ], batch size: 57, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:54:29,256 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 601550 2023-11-29 14:54:43,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4010373.3333333335, ans=0.125 2023-11-29 14:54:49,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4010440.0, ans=0.125 2023-11-29 14:54:49,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4010440.0, ans=0.125 2023-11-29 14:55:02,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4010506.6666666665, ans=0.1 2023-11-29 14:55:22,465 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:55:27,913 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 400, loss[loss=0.04515, simple_loss=0.05301, pruned_loss=0.009289, audio_tagging_loss=0.009358, over 15008.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08868, pruned_loss=0.01163, audio_tagging_loss=0.009613, over 2633577.51 frames. ], batch size: 58, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:55:30,298 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 601600 2023-11-29 14:55:32,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4010640.0, ans=0.125 2023-11-29 14:55:35,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4010640.0, ans=0.2 2023-11-29 14:55:53,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4010773.3333333335, ans=0.125 2023-11-29 14:56:04,482 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.022e+01 9.094e+01 9.565e+01 1.047e+02 1.359e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 14:56:23,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4010906.6666666665, ans=0.0 2023-11-29 14:56:24,206 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.28 vs. limit=22.5 2023-11-29 14:56:29,840 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 450, loss[loss=0.05755, simple_loss=0.08343, pruned_loss=0.008205, audio_tagging_loss=0.007633, over 15554.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08766, pruned_loss=0.01143, audio_tagging_loss=0.009397, over 2726518.02 frames. ], batch size: 57, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:56:32,896 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 601650 2023-11-29 14:56:33,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4010973.3333333335, ans=0.95 2023-11-29 14:56:40,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4010973.3333333335, ans=0.0 2023-11-29 14:56:47,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4011040.0, ans=0.125 2023-11-29 14:57:00,976 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=15.0 2023-11-29 14:57:10,528 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=22.5 2023-11-29 14:57:18,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4011240.0, ans=0.0 2023-11-29 14:57:20,108 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.17 vs. limit=15.0 2023-11-29 14:57:31,220 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 500, loss[loss=0.07619, simple_loss=0.1155, pruned_loss=0.01119, audio_tagging_loss=0.007228, over 16099.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08815, pruned_loss=0.01145, audio_tagging_loss=0.009253, over 2793728.06 frames. ], batch size: 56, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:57:31,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4011306.6666666665, ans=0.125 2023-11-29 14:57:33,688 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 601700 2023-11-29 14:57:45,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4011373.3333333335, ans=0.0 2023-11-29 14:57:59,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4011440.0, ans=0.125 2023-11-29 14:58:01,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4011440.0, ans=0.0 2023-11-29 14:58:07,338 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.832e+01 9.018e+01 9.718e+01 1.038e+02 1.323e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-29 14:58:07,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4011506.6666666665, ans=10.0 2023-11-29 14:58:11,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4011506.6666666665, ans=0.04949747468305833 2023-11-29 14:58:11,451 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.95 vs. limit=15.0 2023-11-29 14:58:21,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4011573.3333333335, ans=0.125 2023-11-29 14:58:26,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4011573.3333333335, ans=0.125 2023-11-29 14:58:32,351 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 550, loss[loss=0.0715, simple_loss=0.1029, pruned_loss=0.01176, audio_tagging_loss=0.008306, over 15626.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08836, pruned_loss=0.01154, audio_tagging_loss=0.009089, over 2845024.59 frames. ], batch size: 61, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:58:34,964 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 601750 2023-11-29 14:58:43,203 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.29 vs. limit=5.0 2023-11-29 14:58:43,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4011706.6666666665, ans=0.2 2023-11-29 14:58:46,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4011706.6666666665, ans=0.125 2023-11-29 14:58:51,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4011706.6666666665, ans=0.0 2023-11-29 14:58:51,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4011706.6666666665, ans=0.0 2023-11-29 14:59:04,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4011773.3333333335, ans=0.0 2023-11-29 14:59:04,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4011773.3333333335, ans=0.125 2023-11-29 14:59:13,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4011840.0, ans=0.2 2023-11-29 14:59:32,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4011906.6666666665, ans=0.0 2023-11-29 14:59:35,624 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 600, loss[loss=0.04096, simple_loss=0.05269, pruned_loss=0.005871, audio_tagging_loss=0.008747, over 14884.00 frames. ], tot_loss[loss=0.0643, simple_loss=0.08744, pruned_loss=0.01146, audio_tagging_loss=0.009114, over 2894254.24 frames. ], batch size: 59, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:59:36,268 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.59 vs. limit=15.0 2023-11-29 14:59:38,707 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 601800 2023-11-29 14:59:43,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4011973.3333333335, ans=0.125 2023-11-29 14:59:44,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=4011973.3333333335, ans=0.05 2023-11-29 14:59:53,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4012040.0, ans=0.1 2023-11-29 15:00:12,015 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.862e+01 9.089e+01 9.742e+01 1.033e+02 1.328e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-29 15:00:14,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4012173.3333333335, ans=0.0 2023-11-29 15:00:21,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4012173.3333333335, ans=0.125 2023-11-29 15:00:37,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4012306.6666666665, ans=0.1 2023-11-29 15:00:38,276 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 650, loss[loss=0.06929, simple_loss=0.08562, pruned_loss=0.01264, audio_tagging_loss=0.01384, over 15175.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08854, pruned_loss=0.01153, audio_tagging_loss=0.009004, over 2927694.24 frames. ], batch size: 56, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 15:00:40,198 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.50 vs. limit=15.0 2023-11-29 15:00:40,755 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 601850 2023-11-29 15:00:48,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4012306.6666666665, ans=0.125 2023-11-29 15:00:50,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4012373.3333333335, ans=0.125 2023-11-29 15:01:07,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4012440.0, ans=0.125 2023-11-29 15:01:17,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4012506.6666666665, ans=0.125 2023-11-29 15:01:39,804 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 700, loss[loss=0.08975, simple_loss=0.1344, pruned_loss=0.01691, audio_tagging_loss=0.005651, over 15987.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08862, pruned_loss=0.01146, audio_tagging_loss=0.008859, over 2954149.43 frames. ], batch size: 55, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 15:01:42,863 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 601900 2023-11-29 15:01:46,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2023-11-29 15:01:51,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2023-11-29 15:01:55,041 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2023-11-29 15:02:15,543 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.121e+01 8.992e+01 9.586e+01 1.033e+02 1.329e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 15:02:35,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4012906.6666666665, ans=0.2 2023-11-29 15:02:41,777 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 750, loss[loss=0.07465, simple_loss=0.09695, pruned_loss=0.01699, audio_tagging_loss=0.009185, over 14692.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08927, pruned_loss=0.01173, audio_tagging_loss=0.008806, over 2980666.50 frames. ], batch size: 56, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 15:02:43,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4012973.3333333335, ans=0.0 2023-11-29 15:02:44,217 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 601950 2023-11-29 15:02:51,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4012973.3333333335, ans=0.2 2023-11-29 15:03:02,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4013040.0, ans=10.0 2023-11-29 15:03:25,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4013173.3333333335, ans=0.125 2023-11-29 15:03:43,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4013306.6666666665, ans=0.0 2023-11-29 15:03:44,111 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 800, loss[loss=0.05361, simple_loss=0.06927, pruned_loss=0.009889, audio_tagging_loss=0.009085, over 14996.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08947, pruned_loss=0.01178, audio_tagging_loss=0.008794, over 2990789.88 frames. ], batch size: 57, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 15:03:46,531 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 602000 2023-11-29 15:03:49,676 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2023-11-29 15:04:03,063 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2023-11-29 15:04:09,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4013440.0, ans=0.0 2023-11-29 15:04:21,464 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.185e+01 9.281e+01 9.942e+01 1.059e+02 1.465e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-29 15:04:35,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2023-11-29 15:04:46,307 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 850, loss[loss=0.06336, simple_loss=0.08975, pruned_loss=0.01076, audio_tagging_loss=0.007719, over 14428.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08877, pruned_loss=0.01167, audio_tagging_loss=0.008785, over 3007026.91 frames. ], batch size: 56, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 15:04:48,734 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 602050 2023-11-29 15:04:53,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4013640.0, ans=0.1 2023-11-29 15:04:54,942 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.47 vs. limit=15.0 2023-11-29 15:05:08,300 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.81 vs. limit=15.0 2023-11-29 15:05:21,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4013773.3333333335, ans=0.0 2023-11-29 15:05:27,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4013840.0, ans=0.125 2023-11-29 15:05:47,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4013973.3333333335, ans=0.0 2023-11-29 15:05:48,319 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 900, loss[loss=0.06612, simple_loss=0.07844, pruned_loss=0.01602, audio_tagging_loss=0.01088, over 15607.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08851, pruned_loss=0.01157, audio_tagging_loss=0.00886, over 3015693.29 frames. ], batch size: 60, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 15:05:50,805 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 602100 2023-11-29 15:05:50,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4013973.3333333335, ans=0.0 2023-11-29 15:05:59,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.14 vs. limit=15.0 2023-11-29 15:06:26,395 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.140e+01 9.010e+01 9.603e+01 1.027e+02 1.199e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-29 15:06:45,595 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.84 vs. limit=15.0 2023-11-29 15:06:51,425 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 950, loss[loss=0.06732, simple_loss=0.09513, pruned_loss=0.009497, audio_tagging_loss=0.01026, over 14945.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08916, pruned_loss=0.01174, audio_tagging_loss=0.008773, over 3027065.08 frames. ], batch size: 57, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 15:06:53,969 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 602150 2023-11-29 15:07:05,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4014373.3333333335, ans=0.125 2023-11-29 15:07:06,570 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 15:07:29,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4014506.6666666665, ans=0.125 2023-11-29 15:07:38,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4014506.6666666665, ans=0.2 2023-11-29 15:07:47,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4014573.3333333335, ans=0.2 2023-11-29 15:07:53,992 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 1000, loss[loss=0.05875, simple_loss=0.07865, pruned_loss=0.01204, audio_tagging_loss=0.007387, over 14778.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08898, pruned_loss=0.01185, audio_tagging_loss=0.008607, over 3020292.05 frames. ], batch size: 55, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 15:07:56,593 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 602200 2023-11-29 15:08:24,045 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 15:08:24,734 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.41 vs. limit=22.5 2023-11-29 15:08:32,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4014840.0, ans=0.1 2023-11-29 15:08:33,537 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.974e+01 9.222e+01 9.984e+01 1.098e+02 2.505e+02, threshold=1.997e+02, percent-clipped=1.0 2023-11-29 15:08:33,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4014840.0, ans=0.125 2023-11-29 15:08:56,585 INFO [train_asr.py:1235] (0/4) Epoch 51, batch 1050, loss[loss=0.05365, simple_loss=0.0772, pruned_loss=0.007519, audio_tagging_loss=0.007531, over 15438.00 frames. ], tot_loss[loss=0.06416, simple_loss=0.08804, pruned_loss=0.01163, audio_tagging_loss=0.008506, over 3021364.92 frames. ], batch size: 60, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 15:08:56,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4014973.3333333335, ans=0.125 2023-11-29 15:08:57,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4014973.3333333335, ans=0.125 2023-11-29 15:08:59,783 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 602250 2023-11-29 15:09:02,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4014973.3333333335, ans=0.125 2023-11-29 15:09:02,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4014973.3333333335, ans=0.0 2023-11-29 15:09:31,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4015106.6666666665, ans=0.2 2023-11-29 15:09:33,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4015106.6666666665, ans=0.125 2023-11-29 15:09:35,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4015106.6666666665, ans=0.125 2023-11-29 15:09:44,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4015173.3333333335, ans=0.2 2023-11-29 15:09:50,706 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/bad-model-0.pt