2023-11-19 16:28:29,696 INFO [train_asr.py:1330] (3/4) Training started 2023-11-19 16:28:29,696 INFO [train_asr.py:1340] (3/4) Device: cuda:3 2023-11-19 16:28:29,702 INFO [train_asr.py:1352] (3/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'multi_KD', 'icefall-git-sha1': 'ae3d64ff-dirty', 'icefall-git-date': 'Sun Nov 19 00:54:09 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_multi_KD', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/anaconda3/envs/multi_KD/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-6-0423201309-7c68fd68fb-qfn6b', 'IP address': '10.177.58.19'}, 'world_size': 4, 'master_port': 13490, 'tensorboard': True, 'num_epochs': 40, 'start_epoch': 10, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'audio_tagging_loss_scale': 1.0, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'do_finetune': False, 'init_modules': None, 'freeze_modules': None, 'finetune_ckpt': None, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'do_audio_tagging': True, 'use_encoder_projection': False, 'encoder_projection_dim': -1, 'freeze_encoder': False, 'freezing_encoder_layer_index': '-1', 'freeze_encoder_steps': -1, 'encoder_lr_scale': 1.0, 'full_libri': True, 'mini_libri': False, 'use_vox2': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_audioset': True, 'audioset_subset': 'unbalanced', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'small.en', 'blank_id': 0, 'vocab_size': 500} 2023-11-19 16:28:29,702 INFO [train_asr.py:1361] (3/4) About to create model 2023-11-19 16:28:30,778 INFO [train_asr.py:1365] (3/4) Number of model parameters: 65819362 2023-11-19 16:28:30,780 INFO [checkpoint.py:112] (3/4) Loading checkpoint from multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-9.pt 2023-11-19 16:28:34,206 INFO [checkpoint.py:112] (3/4) Loading checkpoint from multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-9.pt 2023-11-19 16:28:36,506 INFO [train_asr.py:1396] (3/4) Setting the lr scale of parameters in encoder and encoder_embed to 1.0 2023-11-19 16:28:40,529 INFO [train_asr.py:1405] (3/4) Using DDP 2023-11-19 16:28:40,972 INFO [train_asr.py:1428] (3/4) Loading optimizer state dict 2023-11-19 16:28:41,782 INFO [train_asr.py:1436] (3/4) Loading scheduler state dict 2023-11-19 16:28:41,785 INFO [train_asr.py:1458] (3/4) Getting audioset cuts 2023-11-19 16:28:41,785 INFO [kd_datamodule.py:796] (3/4) About to get the audioset cuts. 2023-11-19 16:28:41,796 INFO [train_asr.py:1464] (3/4) Using mux to combine Librispeech with audioset 2023-11-19 16:28:41,797 INFO [train_asr.py:1474] (3/4) CutSet(len=2748469) [underlying data type: ] 2023-11-19 16:28:57,398 INFO [kd_datamodule.py:396] (3/4) Enable MUSAN 2023-11-19 16:28:57,398 INFO [kd_datamodule.py:397] (3/4) About to get Musan cuts 2023-11-19 16:29:01,014 INFO [kd_datamodule.py:427] (3/4) Enable SpecAugment 2023-11-19 16:29:01,014 INFO [kd_datamodule.py:428] (3/4) Time warp factor: 80 2023-11-19 16:29:01,014 INFO [kd_datamodule.py:438] (3/4) Num frame mask: 10 2023-11-19 16:29:01,015 INFO [kd_datamodule.py:451] (3/4) About to create train dataset 2023-11-19 16:29:01,016 INFO [kd_datamodule.py:487] (3/4) Using SimpleCutSampler 2023-11-19 16:29:01,016 INFO [kd_datamodule.py:495] (3/4) About to create train dataloader 2023-11-19 16:29:01,045 INFO [kd_datamodule.py:814] (3/4) About to get the audioset eval cuts. 2023-11-19 16:29:01,065 INFO [train_asr.py:1538] (3/4) CutSet(len=20681) [underlying data type: ] 2023-11-19 16:29:01,157 INFO [kd_datamodule.py:529] (3/4) About to create dev dataset 2023-11-19 16:29:01,955 INFO [kd_datamodule.py:550] (3/4) About to create dev dataloader 2023-11-19 16:29:01,955 INFO [train_asr.py:1552] (3/4) Loading grad scaler state dict 2023-11-19 16:29:40,854 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 0, loss[loss=0.1005, simple_loss=0.1233, pruned_loss=0.01874, audio_tagging_loss=0.02007, over 15003.00 frames. ], tot_loss[loss=0.1005, simple_loss=0.1233, pruned_loss=0.01874, audio_tagging_loss=0.02007, over 15003.00 frames. ], batch size: 55, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 16:29:40,854 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-19 16:30:14,380 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4382, 3.6859, 2.3667, 3.6991], device='cuda:3') 2023-11-19 16:30:18,309 INFO [train_asr.py:1294] (3/4) Epoch 10, validation: loss=0.06458, simple_loss=0.05578, pruned_loss=0.006608, audio_tagging_loss=0.03008, over 4681554.00 frames. 2023-11-19 16:30:18,310 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-19 16:30:20,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.10 vs. limit=15.0 2023-11-19 16:30:21,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=721400.0, ans=0.035 2023-11-19 16:30:21,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.42 vs. limit=10.0 2023-11-19 16:30:23,363 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.75 vs. limit=15.0 2023-11-19 16:30:27,523 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.461e+01 9.125e+01 9.697e+01 1.516e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 16:30:44,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=721466.6666666666, ans=0.0 2023-11-19 16:30:44,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=721466.6666666666, ans=0.0 2023-11-19 16:31:10,003 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 108250 2023-11-19 16:31:15,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=721666.6666666666, ans=0.125 2023-11-19 16:31:19,569 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.25 vs. limit=15.0 2023-11-19 16:31:20,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=721666.6666666666, ans=0.1 2023-11-19 16:31:26,670 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 50, loss[loss=0.1026, simple_loss=0.1224, pruned_loss=0.02293, audio_tagging_loss=0.01848, over 14609.00 frames. ], tot_loss[loss=0.09587, simple_loss=0.1058, pruned_loss=0.02319, audio_tagging_loss=0.01978, over 687631.10 frames. ], batch size: 54, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 16:31:33,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.54 vs. limit=22.5 2023-11-19 16:31:38,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=721800.0, ans=0.04949747468305833 2023-11-19 16:31:42,795 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.11 vs. limit=15.0 2023-11-19 16:31:53,515 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.69 vs. limit=22.5 2023-11-19 16:31:54,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=721866.6666666666, ans=0.1 2023-11-19 16:32:16,165 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 108300 2023-11-19 16:32:31,692 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 100, loss[loss=0.09191, simple_loss=0.1021, pruned_loss=0.02279, audio_tagging_loss=0.01804, over 16282.00 frames. ], tot_loss[loss=0.09543, simple_loss=0.1062, pruned_loss=0.02317, audio_tagging_loss=0.01915, over 1210313.42 frames. ], batch size: 61, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 16:32:40,250 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.545e+01 8.782e+01 9.608e+01 1.042e+02 1.365e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-19 16:32:47,001 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.05 vs. limit=22.5 2023-11-19 16:32:50,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=722133.3333333334, ans=0.0 2023-11-19 16:33:06,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=722200.0, ans=0.2 2023-11-19 16:33:20,397 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 108350 2023-11-19 16:33:35,198 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 150, loss[loss=0.05714, simple_loss=0.06247, pruned_loss=0.01092, audio_tagging_loss=0.01499, over 14894.00 frames. ], tot_loss[loss=0.09325, simple_loss=0.1065, pruned_loss=0.02296, audio_tagging_loss=0.01702, over 1614885.88 frames. ], batch size: 57, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 16:33:48,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=722466.6666666666, ans=0.035 2023-11-19 16:33:48,556 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.65 vs. limit=15.0 2023-11-19 16:34:09,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=722533.3333333334, ans=0.2 2023-11-19 16:34:11,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=722533.3333333334, ans=0.0 2023-11-19 16:34:22,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=722600.0, ans=0.125 2023-11-19 16:34:24,492 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 108400 2023-11-19 16:34:27,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=722666.6666666666, ans=0.07 2023-11-19 16:34:38,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=722733.3333333334, ans=0.0 2023-11-19 16:34:39,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=722733.3333333334, ans=0.04949747468305833 2023-11-19 16:34:40,689 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 200, loss[loss=0.07522, simple_loss=0.09691, pruned_loss=0.01776, audio_tagging_loss=0.009003, over 15133.00 frames. ], tot_loss[loss=0.09139, simple_loss=0.1066, pruned_loss=0.02313, audio_tagging_loss=0.01495, over 1936731.76 frames. ], batch size: 58, lr: 7.12e-03, grad_scale: 16.0 2023-11-19 16:34:45,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=722733.3333333334, ans=0.0 2023-11-19 16:34:51,739 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.543e+01 8.378e+01 9.256e+01 1.031e+02 1.304e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-19 16:34:53,695 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2023-11-19 16:35:04,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=722800.0, ans=0.125 2023-11-19 16:35:06,016 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.81 vs. limit=6.0 2023-11-19 16:35:08,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=722866.6666666666, ans=0.125 2023-11-19 16:35:20,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=722933.3333333334, ans=0.0 2023-11-19 16:35:22,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=722933.3333333334, ans=0.0 2023-11-19 16:35:23,288 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.41 vs. limit=22.5 2023-11-19 16:35:24,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=722933.3333333334, ans=0.125 2023-11-19 16:35:25,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=722933.3333333334, ans=0.125 2023-11-19 16:35:29,481 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 108450 2023-11-19 16:35:32,061 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.96 vs. limit=22.5 2023-11-19 16:35:36,322 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 16:35:36,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=723000.0, ans=0.125 2023-11-19 16:35:45,238 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 250, loss[loss=0.08208, simple_loss=0.1051, pruned_loss=0.02104, audio_tagging_loss=0.008517, over 14289.00 frames. ], tot_loss[loss=0.09106, simple_loss=0.1079, pruned_loss=0.02352, audio_tagging_loss=0.01358, over 2185732.80 frames. ], batch size: 52, lr: 7.12e-03, grad_scale: 16.0 2023-11-19 16:35:47,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=723066.6666666666, ans=0.125 2023-11-19 16:36:08,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=723200.0, ans=15.0 2023-11-19 16:36:09,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=723200.0, ans=0.0 2023-11-19 16:36:18,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=723200.0, ans=10.0 2023-11-19 16:36:33,696 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 108500 2023-11-19 16:36:35,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=723333.3333333334, ans=0.0 2023-11-19 16:36:48,246 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 300, loss[loss=0.08197, simple_loss=0.1031, pruned_loss=0.01908, audio_tagging_loss=0.01132, over 15115.00 frames. ], tot_loss[loss=0.09066, simple_loss=0.1085, pruned_loss=0.02386, audio_tagging_loss=0.01257, over 2381579.43 frames. ], batch size: 56, lr: 7.11e-03, grad_scale: 16.0 2023-11-19 16:36:58,050 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.707e+01 8.557e+01 9.217e+01 9.967e+01 1.431e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-19 16:37:04,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=723466.6666666666, ans=0.0 2023-11-19 16:37:23,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=723533.3333333334, ans=0.0 2023-11-19 16:37:29,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=723600.0, ans=0.125 2023-11-19 16:37:30,284 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2023-11-19 16:37:32,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=723600.0, ans=0.125 2023-11-19 16:37:37,034 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 108550 2023-11-19 16:37:49,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=723666.6666666666, ans=0.0 2023-11-19 16:37:51,878 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 350, loss[loss=0.08821, simple_loss=0.1023, pruned_loss=0.02464, audio_tagging_loss=0.0124, over 15184.00 frames. ], tot_loss[loss=0.08966, simple_loss=0.1077, pruned_loss=0.0238, audio_tagging_loss=0.01202, over 2528123.91 frames. ], batch size: 56, lr: 7.11e-03, grad_scale: 16.0 2023-11-19 16:38:00,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=723733.3333333334, ans=0.125 2023-11-19 16:38:29,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=723933.3333333334, ans=0.125 2023-11-19 16:38:37,137 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2023-11-19 16:38:40,114 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 108600 2023-11-19 16:38:48,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=724000.0, ans=0.125 2023-11-19 16:38:55,830 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 400, loss[loss=0.09721, simple_loss=0.1241, pruned_loss=0.02267, audio_tagging_loss=0.01247, over 15557.00 frames. ], tot_loss[loss=0.08856, simple_loss=0.1069, pruned_loss=0.02346, audio_tagging_loss=0.01167, over 2644147.36 frames. ], batch size: 56, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 16:39:02,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=724066.6666666666, ans=0.125 2023-11-19 16:39:02,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=724066.6666666666, ans=0.125 2023-11-19 16:39:04,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=724066.6666666666, ans=0.2 2023-11-19 16:39:06,210 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.939e+01 8.916e+01 9.621e+01 1.044e+02 1.431e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-19 16:39:11,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=724133.3333333334, ans=0.09899494936611666 2023-11-19 16:39:37,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=724266.6666666666, ans=0.0 2023-11-19 16:39:40,817 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-19 16:39:44,962 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 108650 2023-11-19 16:39:51,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=724333.3333333334, ans=0.0 2023-11-19 16:39:59,923 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 450, loss[loss=0.09617, simple_loss=0.1182, pruned_loss=0.02444, audio_tagging_loss=0.01262, over 16404.00 frames. ], tot_loss[loss=0.08824, simple_loss=0.1069, pruned_loss=0.02346, audio_tagging_loss=0.01131, over 2732608.95 frames. ], batch size: 60, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 16:40:13,807 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-11-19 16:40:24,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=724533.3333333334, ans=0.125 2023-11-19 16:40:30,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=724533.3333333334, ans=0.125 2023-11-19 16:40:31,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=724533.3333333334, ans=0.1 2023-11-19 16:40:33,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=724533.3333333334, ans=0.125 2023-11-19 16:40:48,040 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 108700 2023-11-19 16:40:56,635 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 16:40:57,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=724666.6666666666, ans=15.0 2023-11-19 16:41:02,582 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 500, loss[loss=0.07448, simple_loss=0.08747, pruned_loss=0.02173, audio_tagging_loss=0.009004, over 15656.00 frames. ], tot_loss[loss=0.08781, simple_loss=0.1062, pruned_loss=0.02355, audio_tagging_loss=0.01116, over 2802703.93 frames. ], batch size: 60, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 16:41:13,947 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.347e+01 9.313e+01 1.051e+02 1.429e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-19 16:41:15,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=724800.0, ans=0.1 2023-11-19 16:41:16,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=724800.0, ans=0.125 2023-11-19 16:41:19,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=724800.0, ans=0.1 2023-11-19 16:41:23,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=724800.0, ans=0.125 2023-11-19 16:41:34,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=724866.6666666666, ans=0.125 2023-11-19 16:41:36,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=724866.6666666666, ans=0.09899494936611666 2023-11-19 16:41:44,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=724933.3333333334, ans=0.0 2023-11-19 16:41:51,533 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 108750 2023-11-19 16:41:53,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=725000.0, ans=0.125 2023-11-19 16:42:04,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=725000.0, ans=0.125 2023-11-19 16:42:06,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=725066.6666666666, ans=0.1 2023-11-19 16:42:07,364 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 550, loss[loss=0.04262, simple_loss=0.04503, pruned_loss=0.008655, audio_tagging_loss=0.01145, over 15013.00 frames. ], tot_loss[loss=0.08692, simple_loss=0.105, pruned_loss=0.02327, audio_tagging_loss=0.01117, over 2852549.05 frames. ], batch size: 58, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 16:42:07,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=725066.6666666666, ans=0.0 2023-11-19 16:42:08,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=725066.6666666666, ans=0.125 2023-11-19 16:42:16,752 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2023-11-19 16:42:18,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=725066.6666666666, ans=0.0 2023-11-19 16:42:25,725 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 16:42:41,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=725200.0, ans=0.0 2023-11-19 16:42:44,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=725266.6666666666, ans=0.1 2023-11-19 16:42:56,333 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 108800 2023-11-19 16:43:12,320 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 600, loss[loss=0.07765, simple_loss=0.08523, pruned_loss=0.023, audio_tagging_loss=0.01204, over 15430.00 frames. ], tot_loss[loss=0.08697, simple_loss=0.1054, pruned_loss=0.02331, audio_tagging_loss=0.01094, over 2885773.37 frames. ], batch size: 60, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 16:43:16,760 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.97 vs. limit=10.0 2023-11-19 16:43:21,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=725400.0, ans=0.125 2023-11-19 16:43:21,974 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.124e+01 8.186e+01 8.803e+01 9.595e+01 1.577e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-19 16:43:27,516 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.40 vs. limit=15.0 2023-11-19 16:43:51,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=725600.0, ans=0.125 2023-11-19 16:43:56,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=725600.0, ans=0.1 2023-11-19 16:43:57,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=725600.0, ans=0.125 2023-11-19 16:44:01,073 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 108850 2023-11-19 16:44:14,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=725733.3333333334, ans=0.125 2023-11-19 16:44:15,720 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 650, loss[loss=0.1179, simple_loss=0.1496, pruned_loss=0.03353, audio_tagging_loss=0.009549, over 15384.00 frames. ], tot_loss[loss=0.08682, simple_loss=0.1054, pruned_loss=0.02323, audio_tagging_loss=0.01089, over 2922455.37 frames. ], batch size: 54, lr: 7.10e-03, grad_scale: 16.0 2023-11-19 16:44:40,807 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.51 vs. limit=22.5 2023-11-19 16:44:41,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=725866.6666666666, ans=0.0 2023-11-19 16:44:54,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=725933.3333333334, ans=0.0 2023-11-19 16:45:00,921 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2023-11-19 16:45:04,120 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 108900 2023-11-19 16:45:20,004 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 700, loss[loss=0.06154, simple_loss=0.07069, pruned_loss=0.01474, audio_tagging_loss=0.01145, over 14968.00 frames. ], tot_loss[loss=0.08662, simple_loss=0.1054, pruned_loss=0.02324, audio_tagging_loss=0.0107, over 2947604.58 frames. ], batch size: 58, lr: 7.10e-03, grad_scale: 16.0 2023-11-19 16:45:28,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=726066.6666666666, ans=0.125 2023-11-19 16:45:30,780 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.106e+01 8.886e+01 9.595e+01 1.122e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-19 16:45:31,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=726133.3333333334, ans=0.125 2023-11-19 16:45:59,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=726266.6666666666, ans=0.125 2023-11-19 16:46:07,921 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 108950 2023-11-19 16:46:18,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.62 vs. limit=12.0 2023-11-19 16:46:22,482 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 750, loss[loss=0.09496, simple_loss=0.114, pruned_loss=0.02596, audio_tagging_loss=0.01203, over 15348.00 frames. ], tot_loss[loss=0.08741, simple_loss=0.1065, pruned_loss=0.02349, audio_tagging_loss=0.01068, over 2979187.36 frames. ], batch size: 57, lr: 7.10e-03, grad_scale: 16.0 2023-11-19 16:46:22,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=726400.0, ans=0.1 2023-11-19 16:46:25,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=726400.0, ans=0.125 2023-11-19 16:46:25,600 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.43 vs. limit=22.5 2023-11-19 16:46:52,734 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 16:46:58,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=726533.3333333334, ans=0.125 2023-11-19 16:47:01,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=726600.0, ans=0.125 2023-11-19 16:47:11,197 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 109000 2023-11-19 16:47:27,214 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 800, loss[loss=0.08433, simple_loss=0.1062, pruned_loss=0.02281, audio_tagging_loss=0.0084, over 15562.00 frames. ], tot_loss[loss=0.08709, simple_loss=0.1064, pruned_loss=0.02324, audio_tagging_loss=0.01063, over 2996381.26 frames. ], batch size: 56, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 16:47:32,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=726733.3333333334, ans=0.0 2023-11-19 16:47:38,640 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.715e+01 8.461e+01 9.150e+01 1.030e+02 1.294e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 16:47:59,607 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2023-11-19 16:48:00,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=726866.6666666666, ans=0.0 2023-11-19 16:48:14,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=726933.3333333334, ans=0.0 2023-11-19 16:48:15,669 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 109050 2023-11-19 16:48:19,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=727000.0, ans=0.0 2023-11-19 16:48:31,302 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 850, loss[loss=0.06806, simple_loss=0.07976, pruned_loss=0.01632, audio_tagging_loss=0.01185, over 17012.00 frames. ], tot_loss[loss=0.08659, simple_loss=0.1053, pruned_loss=0.02318, audio_tagging_loss=0.01073, over 3003848.54 frames. ], batch size: 64, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 16:48:32,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=727066.6666666666, ans=0.0 2023-11-19 16:48:52,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=727133.3333333334, ans=0.0 2023-11-19 16:49:10,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=727266.6666666666, ans=0.0 2023-11-19 16:49:19,640 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 109100 2023-11-19 16:49:23,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=727333.3333333334, ans=0.1 2023-11-19 16:49:28,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=727333.3333333334, ans=0.025 2023-11-19 16:49:34,194 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 900, loss[loss=0.06778, simple_loss=0.08327, pruned_loss=0.01759, audio_tagging_loss=0.00856, over 14456.00 frames. ], tot_loss[loss=0.08693, simple_loss=0.1056, pruned_loss=0.02335, audio_tagging_loss=0.01077, over 3014996.09 frames. ], batch size: 56, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:49:45,196 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.269e+01 9.055e+01 9.679e+01 1.261e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 16:49:53,948 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.23 vs. limit=10.0 2023-11-19 16:50:13,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=727600.0, ans=0.0 2023-11-19 16:50:22,801 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 109150 2023-11-19 16:50:33,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=727666.6666666666, ans=0.2 2023-11-19 16:50:36,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=727733.3333333334, ans=0.125 2023-11-19 16:50:37,255 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 950, loss[loss=0.09524, simple_loss=0.1266, pruned_loss=0.02482, audio_tagging_loss=0.007099, over 14711.00 frames. ], tot_loss[loss=0.08752, simple_loss=0.1066, pruned_loss=0.02369, audio_tagging_loss=0.01051, over 3031360.97 frames. ], batch size: 55, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:51:25,981 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 109200 2023-11-19 16:51:42,995 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 1000, loss[loss=0.09164, simple_loss=0.1037, pruned_loss=0.02873, audio_tagging_loss=0.01104, over 16431.00 frames. ], tot_loss[loss=0.08738, simple_loss=0.1069, pruned_loss=0.02358, audio_tagging_loss=0.01035, over 3038426.73 frames. ], batch size: 65, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:51:52,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=728066.6666666666, ans=0.2 2023-11-19 16:51:53,751 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.418e+01 8.048e+01 8.889e+01 9.743e+01 1.398e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-19 16:51:54,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=728133.3333333334, ans=0.125 2023-11-19 16:52:08,477 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 16:52:09,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=728200.0, ans=0.0 2023-11-19 16:52:12,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=728200.0, ans=0.125 2023-11-19 16:52:18,003 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.31 vs. limit=15.0 2023-11-19 16:52:25,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=728266.6666666666, ans=0.0 2023-11-19 16:52:31,680 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 109250 2023-11-19 16:52:35,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=728333.3333333334, ans=0.0 2023-11-19 16:52:46,318 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 1050, loss[loss=0.08844, simple_loss=0.1036, pruned_loss=0.02721, audio_tagging_loss=0.009448, over 15732.00 frames. ], tot_loss[loss=0.08706, simple_loss=0.1063, pruned_loss=0.02351, audio_tagging_loss=0.01042, over 3042326.62 frames. ], batch size: 60, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:52:46,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=728400.0, ans=0.125 2023-11-19 16:52:56,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=728400.0, ans=0.1 2023-11-19 16:53:02,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=728466.6666666666, ans=0.1 2023-11-19 16:53:23,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=728533.3333333334, ans=0.1 2023-11-19 16:53:27,538 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.65 vs. limit=22.5 2023-11-19 16:53:35,472 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 109300 2023-11-19 16:53:49,977 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 1100, loss[loss=0.07163, simple_loss=0.08579, pruned_loss=0.01731, audio_tagging_loss=0.01143, over 14534.00 frames. ], tot_loss[loss=0.08695, simple_loss=0.1062, pruned_loss=0.02357, audio_tagging_loss=0.0103, over 3039214.84 frames. ], batch size: 55, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:53:52,569 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 16:54:01,858 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.245e+01 8.411e+01 9.070e+01 1.020e+02 1.382e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-19 16:54:05,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=728800.0, ans=0.125 2023-11-19 16:54:24,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=728866.6666666666, ans=0.125 2023-11-19 16:54:28,293 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.51 vs. limit=22.5 2023-11-19 16:54:35,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=728933.3333333334, ans=0.1 2023-11-19 16:54:38,775 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 109350 2023-11-19 16:54:46,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=729000.0, ans=0.0 2023-11-19 16:54:51,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=729000.0, ans=0.1 2023-11-19 16:54:54,105 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 1150, loss[loss=0.128, simple_loss=0.162, pruned_loss=0.03974, audio_tagging_loss=0.007303, over 15670.00 frames. ], tot_loss[loss=0.08706, simple_loss=0.1062, pruned_loss=0.02358, audio_tagging_loss=0.01038, over 3036492.77 frames. ], batch size: 57, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:55:10,821 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.01 vs. limit=12.0 2023-11-19 16:55:35,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=729266.6666666666, ans=0.1 2023-11-19 16:55:42,484 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 109400 2023-11-19 16:55:58,770 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 1200, loss[loss=0.08242, simple_loss=0.1038, pruned_loss=0.01844, audio_tagging_loss=0.01207, over 14844.00 frames. ], tot_loss[loss=0.08602, simple_loss=0.1046, pruned_loss=0.02319, audio_tagging_loss=0.01051, over 3038298.33 frames. ], batch size: 54, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 16:56:06,883 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2023-11-19 16:56:09,729 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.219e+01 8.170e+01 9.038e+01 9.712e+01 1.366e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 16:56:19,296 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.29 vs. limit=15.0 2023-11-19 16:56:35,636 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.88 vs. limit=15.0 2023-11-19 16:56:45,804 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.60 vs. limit=15.0 2023-11-19 16:56:47,492 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 109450 2023-11-19 16:56:57,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=729666.6666666666, ans=0.1 2023-11-19 16:57:02,022 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 1250, loss[loss=0.07218, simple_loss=0.07942, pruned_loss=0.02126, audio_tagging_loss=0.01121, over 14295.00 frames. ], tot_loss[loss=0.08595, simple_loss=0.1045, pruned_loss=0.0232, audio_tagging_loss=0.01048, over 3032414.69 frames. ], batch size: 56, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 16:57:14,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=729800.0, ans=0.125 2023-11-19 16:57:18,733 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.77 vs. limit=22.5 2023-11-19 16:57:20,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=729800.0, ans=0.0 2023-11-19 16:57:31,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.28 vs. limit=5.0 2023-11-19 16:57:37,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=729866.6666666666, ans=0.0 2023-11-19 16:57:51,020 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 109500 2023-11-19 16:57:54,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=730000.0, ans=0.05 2023-11-19 16:57:57,441 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 16:58:05,841 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 1300, loss[loss=0.08546, simple_loss=0.1114, pruned_loss=0.01794, audio_tagging_loss=0.01183, over 14538.00 frames. ], tot_loss[loss=0.08558, simple_loss=0.1042, pruned_loss=0.02306, audio_tagging_loss=0.01041, over 3033188.57 frames. ], batch size: 52, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 16:58:18,234 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.587e+01 8.086e+01 8.673e+01 9.719e+01 1.253e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-19 16:58:27,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=730133.3333333334, ans=0.0 2023-11-19 16:58:54,047 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 109550 2023-11-19 16:59:07,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=730333.3333333334, ans=0.0 2023-11-19 16:59:10,758 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 1350, loss[loss=0.08399, simple_loss=0.09988, pruned_loss=0.02193, audio_tagging_loss=0.01212, over 16044.00 frames. ], tot_loss[loss=0.08492, simple_loss=0.1032, pruned_loss=0.02282, audio_tagging_loss=0.01049, over 3034235.19 frames. ], batch size: 58, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 16:59:14,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=730400.0, ans=0.0 2023-11-19 16:59:20,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=730400.0, ans=0.125 2023-11-19 16:59:56,677 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 16:59:59,111 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 109600 2023-11-19 17:00:04,133 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.59 vs. limit=15.0 2023-11-19 17:00:14,623 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 1400, loss[loss=0.1239, simple_loss=0.1522, pruned_loss=0.03887, audio_tagging_loss=0.008885, over 16701.00 frames. ], tot_loss[loss=0.08587, simple_loss=0.1048, pruned_loss=0.02305, audio_tagging_loss=0.01043, over 3042826.27 frames. ], batch size: 58, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 17:00:23,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=730733.3333333334, ans=0.125 2023-11-19 17:00:25,870 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.607e+01 8.437e+01 9.173e+01 9.925e+01 1.308e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-19 17:00:29,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=730800.0, ans=0.2 2023-11-19 17:00:51,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=730866.6666666666, ans=0.0 2023-11-19 17:01:03,357 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 109650 2023-11-19 17:01:18,054 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 1450, loss[loss=0.09346, simple_loss=0.124, pruned_loss=0.02473, audio_tagging_loss=0.006735, over 16669.00 frames. ], tot_loss[loss=0.08642, simple_loss=0.1053, pruned_loss=0.02324, audio_tagging_loss=0.01052, over 3045428.04 frames. ], batch size: 61, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 17:01:23,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2023-11-19 17:01:25,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=731066.6666666666, ans=0.125 2023-11-19 17:01:39,737 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=15.0 2023-11-19 17:02:04,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=731266.6666666666, ans=0.125 2023-11-19 17:02:04,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=731266.6666666666, ans=0.1 2023-11-19 17:02:05,737 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.02 vs. limit=22.5 2023-11-19 17:02:06,404 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 109700 2023-11-19 17:02:08,387 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.43 vs. limit=15.0 2023-11-19 17:02:22,239 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 1500, loss[loss=0.09774, simple_loss=0.1232, pruned_loss=0.0281, audio_tagging_loss=0.008057, over 15400.00 frames. ], tot_loss[loss=0.08753, simple_loss=0.1066, pruned_loss=0.02374, audio_tagging_loss=0.0105, over 3047380.03 frames. ], batch size: 58, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 17:02:33,346 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=12.0 2023-11-19 17:02:35,059 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 8.288e+01 9.153e+01 9.955e+01 1.243e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 17:02:35,400 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.133e-03 2023-11-19 17:02:37,221 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.05 vs. limit=22.5 2023-11-19 17:02:47,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=731533.3333333334, ans=0.1 2023-11-19 17:02:48,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=731533.3333333334, ans=0.125 2023-11-19 17:02:58,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=731600.0, ans=0.0 2023-11-19 17:03:11,056 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 109750 2023-11-19 17:03:15,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=731666.6666666666, ans=0.125 2023-11-19 17:03:21,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=731666.6666666666, ans=0.125 2023-11-19 17:03:26,124 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 1550, loss[loss=0.08488, simple_loss=0.1024, pruned_loss=0.02078, audio_tagging_loss=0.01289, over 14378.00 frames. ], tot_loss[loss=0.08774, simple_loss=0.1066, pruned_loss=0.02386, audio_tagging_loss=0.01057, over 3039571.96 frames. ], batch size: 55, lr: 7.07e-03, grad_scale: 16.0 2023-11-19 17:03:44,197 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.49 vs. limit=15.0 2023-11-19 17:03:48,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=731800.0, ans=0.125 2023-11-19 17:03:50,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.03 vs. limit=15.0 2023-11-19 17:03:54,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=731866.6666666666, ans=0.125 2023-11-19 17:04:00,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=731866.6666666666, ans=0.125 2023-11-19 17:04:07,833 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2023-11-19 17:04:14,541 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 109800 2023-11-19 17:04:29,597 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 1600, loss[loss=0.0886, simple_loss=0.1021, pruned_loss=0.026, audio_tagging_loss=0.01155, over 15589.00 frames. ], tot_loss[loss=0.08805, simple_loss=0.1069, pruned_loss=0.02403, audio_tagging_loss=0.01058, over 3044878.06 frames. ], batch size: 59, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 17:04:42,924 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.880e+01 8.694e+01 9.571e+01 1.026e+02 1.392e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-19 17:04:47,571 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.63 vs. limit=15.0 2023-11-19 17:04:55,355 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2023-11-19 17:05:03,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=732200.0, ans=0.0 2023-11-19 17:05:07,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=732266.6666666666, ans=0.0 2023-11-19 17:05:16,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=732266.6666666666, ans=0.1 2023-11-19 17:05:18,453 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 109850 2023-11-19 17:05:34,212 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 1650, loss[loss=0.08413, simple_loss=0.1011, pruned_loss=0.02207, audio_tagging_loss=0.0115, over 14922.00 frames. ], tot_loss[loss=0.08708, simple_loss=0.1056, pruned_loss=0.02364, audio_tagging_loss=0.01063, over 3038145.84 frames. ], batch size: 57, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 17:05:39,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=732400.0, ans=0.2 2023-11-19 17:05:46,512 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.82 vs. limit=10.0 2023-11-19 17:05:58,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=732533.3333333334, ans=0.1 2023-11-19 17:06:14,853 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2023-11-19 17:06:16,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=732600.0, ans=0.025 2023-11-19 17:06:22,819 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 109900 2023-11-19 17:06:38,158 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 1700, loss[loss=0.05583, simple_loss=0.07007, pruned_loss=0.01198, audio_tagging_loss=0.00882, over 14602.00 frames. ], tot_loss[loss=0.08671, simple_loss=0.1052, pruned_loss=0.0234, audio_tagging_loss=0.01073, over 3041299.30 frames. ], batch size: 56, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 17:06:50,474 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 8.219e+01 8.857e+01 9.747e+01 1.189e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-19 17:06:53,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=732800.0, ans=0.0 2023-11-19 17:06:55,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=732800.0, ans=0.1 2023-11-19 17:07:05,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=732866.6666666666, ans=0.125 2023-11-19 17:07:27,188 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 109950 2023-11-19 17:07:41,736 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 1750, loss[loss=0.1007, simple_loss=0.1244, pruned_loss=0.02822, audio_tagging_loss=0.01031, over 16362.00 frames. ], tot_loss[loss=0.08735, simple_loss=0.1065, pruned_loss=0.0236, audio_tagging_loss=0.01051, over 3036136.74 frames. ], batch size: 60, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 17:07:41,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=733066.6666666666, ans=0.125 2023-11-19 17:07:43,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=733066.6666666666, ans=0.125 2023-11-19 17:07:50,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=733066.6666666666, ans=0.0 2023-11-19 17:07:58,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=733133.3333333334, ans=0.125 2023-11-19 17:08:00,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=733133.3333333334, ans=0.125 2023-11-19 17:08:10,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=733200.0, ans=0.125 2023-11-19 17:08:22,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=733266.6666666666, ans=0.125 2023-11-19 17:08:30,321 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 110000 2023-11-19 17:08:33,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=733333.3333333334, ans=0.125 2023-11-19 17:08:37,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=733333.3333333334, ans=0.2 2023-11-19 17:08:46,824 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 1800, loss[loss=0.1104, simple_loss=0.1319, pruned_loss=0.03546, audio_tagging_loss=0.00897, over 14458.00 frames. ], tot_loss[loss=0.08663, simple_loss=0.1056, pruned_loss=0.02334, audio_tagging_loss=0.01047, over 3036738.09 frames. ], batch size: 54, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 17:09:00,177 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.669e+01 8.372e+01 9.088e+01 1.009e+02 1.305e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-19 17:09:00,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=733466.6666666666, ans=0.1 2023-11-19 17:09:08,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=733466.6666666666, ans=0.125 2023-11-19 17:09:25,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=733600.0, ans=0.0 2023-11-19 17:09:35,474 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 110050 2023-11-19 17:09:50,022 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 1850, loss[loss=0.09246, simple_loss=0.113, pruned_loss=0.02431, audio_tagging_loss=0.01168, over 15147.00 frames. ], tot_loss[loss=0.0875, simple_loss=0.1067, pruned_loss=0.02379, audio_tagging_loss=0.01034, over 3043071.74 frames. ], batch size: 54, lr: 7.06e-03, grad_scale: 16.0 2023-11-19 17:09:59,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=733733.3333333334, ans=0.125 2023-11-19 17:10:11,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=733800.0, ans=0.5 2023-11-19 17:10:20,013 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:10:38,798 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 110100 2023-11-19 17:10:48,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=734000.0, ans=0.125 2023-11-19 17:10:51,454 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.91 vs. limit=12.0 2023-11-19 17:10:54,512 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 1900, loss[loss=0.08264, simple_loss=0.09958, pruned_loss=0.02357, audio_tagging_loss=0.009282, over 16268.00 frames. ], tot_loss[loss=0.08654, simple_loss=0.1057, pruned_loss=0.02335, audio_tagging_loss=0.01035, over 3047488.89 frames. ], batch size: 60, lr: 7.06e-03, grad_scale: 16.0 2023-11-19 17:11:07,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=734133.3333333334, ans=0.125 2023-11-19 17:11:08,887 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.512e+01 8.528e+01 8.978e+01 9.700e+01 1.316e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 17:11:15,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=734133.3333333334, ans=0.125 2023-11-19 17:11:29,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=734200.0, ans=0.2 2023-11-19 17:11:43,538 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 110150 2023-11-19 17:11:44,315 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.10 vs. limit=15.0 2023-11-19 17:11:52,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=734333.3333333334, ans=0.0 2023-11-19 17:11:54,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=734333.3333333334, ans=0.1 2023-11-19 17:11:59,397 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 1950, loss[loss=0.06033, simple_loss=0.06687, pruned_loss=0.0154, audio_tagging_loss=0.0115, over 14701.00 frames. ], tot_loss[loss=0.08575, simple_loss=0.1048, pruned_loss=0.02303, audio_tagging_loss=0.01032, over 3045319.41 frames. ], batch size: 56, lr: 7.06e-03, grad_scale: 16.0 2023-11-19 17:12:02,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=734400.0, ans=0.125 2023-11-19 17:12:08,354 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:12:11,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=734466.6666666666, ans=0.2 2023-11-19 17:12:15,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=734466.6666666666, ans=0.0 2023-11-19 17:12:32,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=734533.3333333334, ans=0.125 2023-11-19 17:12:48,359 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 110200 2023-11-19 17:13:03,587 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 2000, loss[loss=0.08152, simple_loss=0.1009, pruned_loss=0.02101, audio_tagging_loss=0.01005, over 14966.00 frames. ], tot_loss[loss=0.08562, simple_loss=0.1043, pruned_loss=0.02306, audio_tagging_loss=0.01039, over 3039808.65 frames. ], batch size: 55, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 17:13:17,909 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.098e+01 8.336e+01 8.914e+01 9.433e+01 1.309e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-19 17:13:21,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=734800.0, ans=0.125 2023-11-19 17:13:31,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=734866.6666666666, ans=0.1 2023-11-19 17:13:37,306 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2023-11-19 17:13:43,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=734933.3333333334, ans=0.125 2023-11-19 17:13:52,702 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 110250 2023-11-19 17:13:55,599 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=15.0 2023-11-19 17:14:01,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=735000.0, ans=0.0 2023-11-19 17:14:03,186 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.53 vs. limit=22.5 2023-11-19 17:14:08,088 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 2050, loss[loss=0.08781, simple_loss=0.1025, pruned_loss=0.0243, audio_tagging_loss=0.01223, over 14671.00 frames. ], tot_loss[loss=0.08532, simple_loss=0.104, pruned_loss=0.0229, audio_tagging_loss=0.01042, over 3039408.52 frames. ], batch size: 56, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 17:14:14,726 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=15.0 2023-11-19 17:14:18,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=735066.6666666666, ans=0.125 2023-11-19 17:14:27,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=735133.3333333334, ans=0.2 2023-11-19 17:14:36,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=735200.0, ans=0.125 2023-11-19 17:14:51,480 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.95 vs. limit=22.5 2023-11-19 17:14:55,486 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 110300 2023-11-19 17:15:01,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=735333.3333333334, ans=0.0 2023-11-19 17:15:10,849 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 2100, loss[loss=0.09242, simple_loss=0.1099, pruned_loss=0.02634, audio_tagging_loss=0.0111, over 16144.00 frames. ], tot_loss[loss=0.08535, simple_loss=0.1041, pruned_loss=0.02286, audio_tagging_loss=0.01042, over 3039605.43 frames. ], batch size: 59, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 17:15:20,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=735400.0, ans=0.125 2023-11-19 17:15:23,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=735466.6666666666, ans=0.125 2023-11-19 17:15:25,438 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.159e+01 8.890e+01 9.967e+01 1.434e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-19 17:15:45,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=735533.3333333334, ans=0.1 2023-11-19 17:15:51,070 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:15:59,773 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 110350 2023-11-19 17:16:06,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=735666.6666666666, ans=0.09899494936611666 2023-11-19 17:16:15,551 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 2150, loss[loss=0.07903, simple_loss=0.09026, pruned_loss=0.02131, audio_tagging_loss=0.0126, over 14808.00 frames. ], tot_loss[loss=0.08552, simple_loss=0.1042, pruned_loss=0.0229, audio_tagging_loss=0.01051, over 3035305.59 frames. ], batch size: 55, lr: 7.05e-03, grad_scale: 32.0 2023-11-19 17:16:33,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=735800.0, ans=0.07 2023-11-19 17:16:55,027 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:17:01,907 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2023-11-19 17:17:04,973 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 110400 2023-11-19 17:17:20,310 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 2200, loss[loss=0.122, simple_loss=0.1495, pruned_loss=0.03877, audio_tagging_loss=0.008486, over 14938.00 frames. ], tot_loss[loss=0.08627, simple_loss=0.1052, pruned_loss=0.02318, audio_tagging_loss=0.0105, over 3036754.07 frames. ], batch size: 55, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 17:17:35,746 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.097e+01 8.594e+01 9.409e+01 1.055e+02 1.451e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-19 17:17:47,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=736200.0, ans=0.07 2023-11-19 17:17:51,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=736200.0, ans=0.0 2023-11-19 17:18:03,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=736266.6666666666, ans=0.125 2023-11-19 17:18:04,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=736266.6666666666, ans=0.0 2023-11-19 17:18:10,092 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 110450 2023-11-19 17:18:21,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=736333.3333333334, ans=0.125 2023-11-19 17:18:23,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=736333.3333333334, ans=0.125 2023-11-19 17:18:25,406 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 2250, loss[loss=0.0622, simple_loss=0.07457, pruned_loss=0.01175, audio_tagging_loss=0.01316, over 14843.00 frames. ], tot_loss[loss=0.08585, simple_loss=0.1048, pruned_loss=0.02294, audio_tagging_loss=0.01052, over 3027674.62 frames. ], batch size: 55, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 17:18:25,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=736400.0, ans=0.125 2023-11-19 17:18:28,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=736400.0, ans=0.125 2023-11-19 17:18:44,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=736466.6666666666, ans=0.125 2023-11-19 17:18:56,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=736533.3333333334, ans=0.2 2023-11-19 17:19:09,804 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:19:15,053 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 110500 2023-11-19 17:19:16,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=736666.6666666666, ans=0.125 2023-11-19 17:19:26,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=736666.6666666666, ans=0.2 2023-11-19 17:19:31,623 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 2300, loss[loss=0.08996, simple_loss=0.1075, pruned_loss=0.02614, audio_tagging_loss=0.01005, over 15782.00 frames. ], tot_loss[loss=0.08614, simple_loss=0.1049, pruned_loss=0.02314, audio_tagging_loss=0.01053, over 3034757.39 frames. ], batch size: 60, lr: 7.05e-03, grad_scale: 8.0 2023-11-19 17:19:39,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=736733.3333333334, ans=0.125 2023-11-19 17:19:39,112 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:19:40,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=736733.3333333334, ans=0.2 2023-11-19 17:19:40,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=736733.3333333334, ans=0.1 2023-11-19 17:19:41,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=736733.3333333334, ans=0.0 2023-11-19 17:19:47,355 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.474e+01 9.296e+01 1.022e+02 1.350e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 17:19:48,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=736800.0, ans=0.125 2023-11-19 17:20:13,143 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.85 vs. limit=15.0 2023-11-19 17:20:21,220 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 110550 2023-11-19 17:20:28,701 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:20:36,090 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 2350, loss[loss=0.09396, simple_loss=0.1267, pruned_loss=0.02041, audio_tagging_loss=0.01018, over 15804.00 frames. ], tot_loss[loss=0.08617, simple_loss=0.1048, pruned_loss=0.02316, audio_tagging_loss=0.01063, over 3035501.67 frames. ], batch size: 57, lr: 7.05e-03, grad_scale: 8.0 2023-11-19 17:20:43,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737066.6666666666, ans=0.1 2023-11-19 17:20:49,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=737133.3333333334, ans=0.125 2023-11-19 17:20:54,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=737133.3333333334, ans=0.0 2023-11-19 17:21:05,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=737200.0, ans=0.125 2023-11-19 17:21:25,376 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 110600 2023-11-19 17:21:37,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=737333.3333333334, ans=0.0 2023-11-19 17:21:40,463 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 2400, loss[loss=0.07818, simple_loss=0.09446, pruned_loss=0.01819, audio_tagging_loss=0.01276, over 15195.00 frames. ], tot_loss[loss=0.08629, simple_loss=0.105, pruned_loss=0.02313, audio_tagging_loss=0.01066, over 3044982.90 frames. ], batch size: 56, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 17:21:41,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=737400.0, ans=0.0 2023-11-19 17:21:44,222 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0 2023-11-19 17:21:58,853 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.550e+01 9.088e+01 1.010e+02 1.299e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-19 17:22:02,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=737466.6666666666, ans=0.125 2023-11-19 17:22:13,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.83 vs. limit=22.5 2023-11-19 17:22:15,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=737533.3333333334, ans=0.125 2023-11-19 17:22:22,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=737600.0, ans=0.2 2023-11-19 17:22:27,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=737600.0, ans=0.125 2023-11-19 17:22:29,513 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 110650 2023-11-19 17:22:46,869 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 2450, loss[loss=0.08058, simple_loss=0.0873, pruned_loss=0.02011, audio_tagging_loss=0.01683, over 16229.00 frames. ], tot_loss[loss=0.08582, simple_loss=0.1043, pruned_loss=0.02291, audio_tagging_loss=0.01075, over 3048352.35 frames. ], batch size: 63, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:22:58,351 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2023-11-19 17:23:05,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=737800.0, ans=0.1 2023-11-19 17:23:05,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737800.0, ans=0.1 2023-11-19 17:23:09,519 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2023-11-19 17:23:35,181 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 110700 2023-11-19 17:23:49,615 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 2500, loss[loss=0.07075, simple_loss=0.07671, pruned_loss=0.01816, audio_tagging_loss=0.01424, over 15036.00 frames. ], tot_loss[loss=0.08605, simple_loss=0.1049, pruned_loss=0.02293, audio_tagging_loss=0.01068, over 3053664.83 frames. ], batch size: 59, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:23:52,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=738066.6666666666, ans=0.0 2023-11-19 17:23:52,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=738066.6666666666, ans=0.2 2023-11-19 17:24:05,694 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.741e+01 8.346e+01 8.795e+01 9.751e+01 1.396e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-19 17:24:10,152 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.84 vs. limit=10.0 2023-11-19 17:24:18,793 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:24:35,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=738266.6666666666, ans=0.0 2023-11-19 17:24:38,678 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 110750 2023-11-19 17:24:41,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=738333.3333333334, ans=0.125 2023-11-19 17:24:53,075 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 2550, loss[loss=0.0925, simple_loss=0.11, pruned_loss=0.02784, audio_tagging_loss=0.009645, over 14401.00 frames. ], tot_loss[loss=0.08567, simple_loss=0.1044, pruned_loss=0.02291, audio_tagging_loss=0.01055, over 3048865.16 frames. ], batch size: 57, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:25:06,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=738466.6666666666, ans=0.1 2023-11-19 17:25:08,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=738466.6666666666, ans=0.1 2023-11-19 17:25:42,391 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 110800 2023-11-19 17:25:46,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=738666.6666666666, ans=0.125 2023-11-19 17:26:00,181 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 2600, loss[loss=0.08795, simple_loss=0.1013, pruned_loss=0.02477, audio_tagging_loss=0.01254, over 15723.00 frames. ], tot_loss[loss=0.08552, simple_loss=0.1042, pruned_loss=0.02295, audio_tagging_loss=0.01046, over 3048714.09 frames. ], batch size: 59, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:26:16,158 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.092e+01 8.270e+01 8.898e+01 9.575e+01 2.029e+02, threshold=1.780e+02, percent-clipped=1.0 2023-11-19 17:26:23,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=738866.6666666666, ans=0.125 2023-11-19 17:26:26,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=738866.6666666666, ans=0.0 2023-11-19 17:26:28,045 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.45 vs. limit=22.5 2023-11-19 17:26:35,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=738866.6666666666, ans=0.0 2023-11-19 17:26:37,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=738933.3333333334, ans=0.0 2023-11-19 17:26:46,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=738933.3333333334, ans=0.125 2023-11-19 17:26:49,144 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 110850 2023-11-19 17:27:03,855 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 2650, loss[loss=0.1227, simple_loss=0.1391, pruned_loss=0.04312, audio_tagging_loss=0.01002, over 15627.00 frames. ], tot_loss[loss=0.08596, simple_loss=0.1047, pruned_loss=0.0232, audio_tagging_loss=0.01039, over 3043799.62 frames. ], batch size: 57, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:27:15,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=739133.3333333334, ans=0.0 2023-11-19 17:27:19,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2023-11-19 17:27:20,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2023-11-19 17:27:25,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=739133.3333333334, ans=10.0 2023-11-19 17:27:32,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=739200.0, ans=0.125 2023-11-19 17:27:51,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=739266.6666666666, ans=0.125 2023-11-19 17:27:53,020 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 110900 2023-11-19 17:27:53,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=739266.6666666666, ans=0.125 2023-11-19 17:28:07,638 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 2700, loss[loss=0.07857, simple_loss=0.0985, pruned_loss=0.02015, audio_tagging_loss=0.009167, over 15359.00 frames. ], tot_loss[loss=0.08526, simple_loss=0.1039, pruned_loss=0.02295, audio_tagging_loss=0.01035, over 3045624.39 frames. ], batch size: 56, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:28:11,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=739400.0, ans=0.125 2023-11-19 17:28:13,294 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2023-11-19 17:28:25,558 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.580e+01 8.552e+01 9.403e+01 1.042e+02 1.397e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-19 17:28:31,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=739466.6666666666, ans=0.125 2023-11-19 17:28:57,049 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 110950 2023-11-19 17:29:01,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=739666.6666666666, ans=0.2 2023-11-19 17:29:06,181 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.61 vs. limit=22.5 2023-11-19 17:29:12,987 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 2750, loss[loss=0.08736, simple_loss=0.101, pruned_loss=0.0237, audio_tagging_loss=0.01313, over 15470.00 frames. ], tot_loss[loss=0.08503, simple_loss=0.1034, pruned_loss=0.02281, audio_tagging_loss=0.0105, over 3045953.64 frames. ], batch size: 58, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:29:18,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=739733.3333333334, ans=0.1 2023-11-19 17:29:43,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=739866.6666666666, ans=0.2 2023-11-19 17:29:44,786 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.24 vs. limit=15.0 2023-11-19 17:30:01,030 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 111000 2023-11-19 17:30:01,540 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.56 vs. limit=10.0 2023-11-19 17:30:06,995 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:30:11,386 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.14 vs. limit=15.0 2023-11-19 17:30:16,725 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 2800, loss[loss=0.1279, simple_loss=0.1629, pruned_loss=0.03848, audio_tagging_loss=0.007949, over 15794.00 frames. ], tot_loss[loss=0.0852, simple_loss=0.104, pruned_loss=0.02277, audio_tagging_loss=0.01044, over 3044924.18 frames. ], batch size: 56, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 17:30:32,828 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.580e+01 8.267e+01 8.759e+01 9.728e+01 1.191e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-19 17:30:36,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=740133.3333333334, ans=15.0 2023-11-19 17:30:40,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=740200.0, ans=15.0 2023-11-19 17:30:47,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=740200.0, ans=0.125 2023-11-19 17:30:54,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=740266.6666666666, ans=0.125 2023-11-19 17:31:05,957 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 111050 2023-11-19 17:31:08,975 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.94 vs. limit=10.0 2023-11-19 17:31:11,517 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.54 vs. limit=15.0 2023-11-19 17:31:20,585 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 2850, loss[loss=0.07066, simple_loss=0.09298, pruned_loss=0.01481, audio_tagging_loss=0.009364, over 14077.00 frames. ], tot_loss[loss=0.08475, simple_loss=0.1035, pruned_loss=0.02263, audio_tagging_loss=0.01038, over 3045647.40 frames. ], batch size: 54, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 17:31:22,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=740400.0, ans=0.125 2023-11-19 17:31:56,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=740533.3333333334, ans=0.125 2023-11-19 17:32:03,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=740600.0, ans=0.125 2023-11-19 17:32:09,395 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 111100 2023-11-19 17:32:10,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=740666.6666666666, ans=0.125 2023-11-19 17:32:19,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=740666.6666666666, ans=0.125 2023-11-19 17:32:20,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=740666.6666666666, ans=0.1 2023-11-19 17:32:25,214 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 2900, loss[loss=0.09439, simple_loss=0.1124, pruned_loss=0.02908, audio_tagging_loss=0.00911, over 14245.00 frames. ], tot_loss[loss=0.0845, simple_loss=0.103, pruned_loss=0.02253, audio_tagging_loss=0.01048, over 3039711.48 frames. ], batch size: 58, lr: 7.03e-03, grad_scale: 16.0 2023-11-19 17:32:32,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.82 vs. limit=8.0 2023-11-19 17:32:36,036 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.78 vs. limit=15.0 2023-11-19 17:32:41,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=740800.0, ans=0.0 2023-11-19 17:32:43,309 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.731e+01 8.402e+01 9.299e+01 9.982e+01 1.196e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 17:32:43,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=740800.0, ans=0.2 2023-11-19 17:32:47,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=740800.0, ans=0.1 2023-11-19 17:33:01,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=740933.3333333334, ans=0.2 2023-11-19 17:33:08,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=740933.3333333334, ans=0.1 2023-11-19 17:33:10,234 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.42 vs. limit=22.5 2023-11-19 17:33:11,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=740933.3333333334, ans=0.125 2023-11-19 17:33:14,548 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 111150 2023-11-19 17:33:23,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=741000.0, ans=0.0 2023-11-19 17:33:28,970 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 2950, loss[loss=0.0758, simple_loss=0.09224, pruned_loss=0.01828, audio_tagging_loss=0.0114, over 15724.00 frames. ], tot_loss[loss=0.08572, simple_loss=0.105, pruned_loss=0.02288, audio_tagging_loss=0.01034, over 3042525.58 frames. ], batch size: 58, lr: 7.03e-03, grad_scale: 16.0 2023-11-19 17:33:31,586 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.06 vs. limit=6.0 2023-11-19 17:33:49,769 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:34:01,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=741200.0, ans=0.125 2023-11-19 17:34:17,243 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 111200 2023-11-19 17:34:25,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=741333.3333333334, ans=0.125 2023-11-19 17:34:33,024 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 3000, loss[loss=0.08612, simple_loss=0.09604, pruned_loss=0.02613, audio_tagging_loss=0.01197, over 14937.00 frames. ], tot_loss[loss=0.0865, simple_loss=0.1059, pruned_loss=0.02316, audio_tagging_loss=0.0104, over 3045407.00 frames. ], batch size: 55, lr: 7.03e-03, grad_scale: 16.0 2023-11-19 17:34:33,024 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-19 17:35:14,015 INFO [train_asr.py:1294] (3/4) Epoch 10, validation: loss=0.06437, simple_loss=0.0554, pruned_loss=0.006444, audio_tagging_loss=0.03022, over 4681554.00 frames. 2023-11-19 17:35:14,015 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-19 17:35:23,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=741400.0, ans=0.125 2023-11-19 17:35:31,924 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.687e+01 8.390e+01 9.154e+01 1.009e+02 1.642e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 17:35:53,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=741600.0, ans=0.125 2023-11-19 17:35:56,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=741600.0, ans=0.125 2023-11-19 17:35:57,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=741600.0, ans=0.0 2023-11-19 17:36:03,177 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 111250 2023-11-19 17:36:17,865 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 3050, loss[loss=0.08983, simple_loss=0.1104, pruned_loss=0.02401, audio_tagging_loss=0.01064, over 14633.00 frames. ], tot_loss[loss=0.08711, simple_loss=0.1065, pruned_loss=0.02341, audio_tagging_loss=0.01046, over 3046712.21 frames. ], batch size: 55, lr: 7.03e-03, grad_scale: 16.0 2023-11-19 17:36:27,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=741733.3333333334, ans=0.125 2023-11-19 17:36:36,448 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.80 vs. limit=15.0 2023-11-19 17:36:48,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=741866.6666666666, ans=0.1 2023-11-19 17:36:55,105 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:36:56,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=741933.3333333334, ans=0.125 2023-11-19 17:37:06,352 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 111300 2023-11-19 17:37:10,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=742000.0, ans=0.125 2023-11-19 17:37:21,854 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 3100, loss[loss=0.06589, simple_loss=0.07549, pruned_loss=0.01443, audio_tagging_loss=0.01371, over 15650.00 frames. ], tot_loss[loss=0.08794, simple_loss=0.1074, pruned_loss=0.02378, audio_tagging_loss=0.01045, over 3051731.18 frames. ], batch size: 58, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 17:37:22,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=742066.6666666666, ans=0.0 2023-11-19 17:37:27,536 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.29 vs. limit=10.0 2023-11-19 17:37:40,859 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.012e+01 8.266e+01 9.120e+01 9.877e+01 1.232e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 17:37:43,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=742133.3333333334, ans=0.125 2023-11-19 17:37:51,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=742200.0, ans=0.0 2023-11-19 17:37:54,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=742200.0, ans=0.2 2023-11-19 17:37:57,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=742200.0, ans=0.125 2023-11-19 17:38:00,137 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2023-11-19 17:38:11,332 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 111350 2023-11-19 17:38:16,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=742333.3333333334, ans=0.2 2023-11-19 17:38:21,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=742333.3333333334, ans=0.125 2023-11-19 17:38:25,867 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.64 vs. limit=10.0 2023-11-19 17:38:26,721 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 3150, loss[loss=0.1024, simple_loss=0.1332, pruned_loss=0.02708, audio_tagging_loss=0.008673, over 15896.00 frames. ], tot_loss[loss=0.08864, simple_loss=0.1084, pruned_loss=0.02402, audio_tagging_loss=0.01043, over 3053986.10 frames. ], batch size: 59, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 17:38:30,995 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=15.0 2023-11-19 17:39:05,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=742600.0, ans=0.95 2023-11-19 17:39:13,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=742600.0, ans=0.2 2023-11-19 17:39:15,719 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 111400 2023-11-19 17:39:17,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=742666.6666666666, ans=0.125 2023-11-19 17:39:32,258 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 3200, loss[loss=0.09219, simple_loss=0.1171, pruned_loss=0.0227, audio_tagging_loss=0.01093, over 14822.00 frames. ], tot_loss[loss=0.08812, simple_loss=0.1077, pruned_loss=0.02367, audio_tagging_loss=0.01059, over 3061731.09 frames. ], batch size: 54, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 17:39:45,972 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.57 vs. limit=10.0 2023-11-19 17:39:50,365 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.737e+01 8.277e+01 9.297e+01 1.012e+02 1.250e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 17:39:54,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=742800.0, ans=0.0 2023-11-19 17:40:15,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.76 vs. limit=10.0 2023-11-19 17:40:17,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=742933.3333333334, ans=0.125 2023-11-19 17:40:22,044 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 111450 2023-11-19 17:40:27,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=743000.0, ans=0.125 2023-11-19 17:40:37,266 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 3250, loss[loss=0.07282, simple_loss=0.09485, pruned_loss=0.01734, audio_tagging_loss=0.008048, over 16059.00 frames. ], tot_loss[loss=0.08704, simple_loss=0.1063, pruned_loss=0.02317, audio_tagging_loss=0.01072, over 3061411.58 frames. ], batch size: 62, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 17:41:25,806 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 111500 2023-11-19 17:41:36,222 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=12.0 2023-11-19 17:41:40,560 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 3300, loss[loss=0.07067, simple_loss=0.08508, pruned_loss=0.01641, audio_tagging_loss=0.01173, over 16244.00 frames. ], tot_loss[loss=0.08745, simple_loss=0.1069, pruned_loss=0.02339, audio_tagging_loss=0.01063, over 3064828.06 frames. ], batch size: 62, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 17:42:00,067 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.519e+01 8.050e+01 8.952e+01 9.862e+01 1.284e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-19 17:42:06,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=743533.3333333334, ans=0.125 2023-11-19 17:42:06,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=743533.3333333334, ans=0.0 2023-11-19 17:42:18,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=743600.0, ans=0.125 2023-11-19 17:42:20,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=743600.0, ans=0.1 2023-11-19 17:42:27,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=743600.0, ans=0.1 2023-11-19 17:42:29,501 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 111550 2023-11-19 17:42:40,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=743666.6666666666, ans=0.035 2023-11-19 17:42:45,508 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 3350, loss[loss=0.07708, simple_loss=0.1006, pruned_loss=0.01878, audio_tagging_loss=0.008007, over 16384.00 frames. ], tot_loss[loss=0.08696, simple_loss=0.1068, pruned_loss=0.02317, audio_tagging_loss=0.0104, over 3069067.49 frames. ], batch size: 60, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 17:43:02,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=743800.0, ans=0.0 2023-11-19 17:43:04,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=743800.0, ans=0.0 2023-11-19 17:43:10,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=743866.6666666666, ans=0.125 2023-11-19 17:43:10,728 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.64 vs. limit=22.5 2023-11-19 17:43:26,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=743933.3333333334, ans=0.125 2023-11-19 17:43:34,589 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 111600 2023-11-19 17:43:46,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=744000.0, ans=0.125 2023-11-19 17:43:50,272 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 3400, loss[loss=0.06754, simple_loss=0.07667, pruned_loss=0.01933, audio_tagging_loss=0.009869, over 14318.00 frames. ], tot_loss[loss=0.08666, simple_loss=0.1063, pruned_loss=0.02321, audio_tagging_loss=0.01028, over 3063363.73 frames. ], batch size: 55, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 17:44:08,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=744133.3333333334, ans=0.5 2023-11-19 17:44:09,432 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.409e+01 9.014e+01 1.006e+02 1.399e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-19 17:44:18,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=744200.0, ans=0.025 2023-11-19 17:44:22,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=744200.0, ans=0.1 2023-11-19 17:44:24,612 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:44:24,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=744200.0, ans=0.0 2023-11-19 17:44:34,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=744266.6666666666, ans=0.5 2023-11-19 17:44:39,204 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 111650 2023-11-19 17:44:43,633 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.27 vs. limit=12.0 2023-11-19 17:44:54,679 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 3450, loss[loss=0.08113, simple_loss=0.1032, pruned_loss=0.01979, audio_tagging_loss=0.009718, over 14985.00 frames. ], tot_loss[loss=0.08649, simple_loss=0.1061, pruned_loss=0.02314, audio_tagging_loss=0.01029, over 3053988.85 frames. ], batch size: 58, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 17:44:54,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=744400.0, ans=0.0 2023-11-19 17:45:23,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=744533.3333333334, ans=0.125 2023-11-19 17:45:33,881 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.63 vs. limit=15.0 2023-11-19 17:45:41,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=744600.0, ans=0.1 2023-11-19 17:45:43,761 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 111700 2023-11-19 17:45:54,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=744666.6666666666, ans=0.125 2023-11-19 17:45:55,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=744666.6666666666, ans=0.0 2023-11-19 17:45:59,675 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 3500, loss[loss=0.09278, simple_loss=0.1174, pruned_loss=0.0224, audio_tagging_loss=0.01169, over 16401.00 frames. ], tot_loss[loss=0.08685, simple_loss=0.1067, pruned_loss=0.0233, audio_tagging_loss=0.01019, over 3054093.55 frames. ], batch size: 62, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 17:46:13,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=744800.0, ans=0.125 2023-11-19 17:46:18,194 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.875e+01 8.242e+01 8.864e+01 9.843e+01 1.271e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-19 17:46:31,138 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:46:36,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=744933.3333333334, ans=0.125 2023-11-19 17:46:47,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=744933.3333333334, ans=0.1 2023-11-19 17:46:48,452 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 111750 2023-11-19 17:46:49,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=745000.0, ans=0.05 2023-11-19 17:46:52,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=745000.0, ans=0.125 2023-11-19 17:46:55,293 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.65 vs. limit=22.5 2023-11-19 17:47:03,407 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 3550, loss[loss=0.09631, simple_loss=0.1146, pruned_loss=0.02616, audio_tagging_loss=0.01286, over 15681.00 frames. ], tot_loss[loss=0.08648, simple_loss=0.1062, pruned_loss=0.02322, audio_tagging_loss=0.01015, over 3053815.38 frames. ], batch size: 59, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 17:47:12,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=745066.6666666666, ans=0.1 2023-11-19 17:47:13,668 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:47:14,097 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2023-11-19 17:47:24,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=745133.3333333334, ans=0.0 2023-11-19 17:47:28,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=745200.0, ans=0.0 2023-11-19 17:47:38,031 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.87 vs. limit=6.0 2023-11-19 17:47:52,397 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 111800 2023-11-19 17:47:52,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=745266.6666666666, ans=0.125 2023-11-19 17:47:57,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.15 vs. limit=6.0 2023-11-19 17:47:58,439 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=12.0 2023-11-19 17:48:00,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=745333.3333333334, ans=0.2 2023-11-19 17:48:04,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=745333.3333333334, ans=0.1 2023-11-19 17:48:07,834 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 3600, loss[loss=0.06895, simple_loss=0.08095, pruned_loss=0.01498, audio_tagging_loss=0.01349, over 14273.00 frames. ], tot_loss[loss=0.08646, simple_loss=0.106, pruned_loss=0.02327, audio_tagging_loss=0.01019, over 3049294.92 frames. ], batch size: 55, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 17:48:24,035 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.77 vs. limit=22.5 2023-11-19 17:48:26,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=745466.6666666666, ans=0.125 2023-11-19 17:48:28,208 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.581e+01 8.234e+01 9.119e+01 9.988e+01 1.352e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 17:48:30,205 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.55 vs. limit=22.5 2023-11-19 17:48:35,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=745533.3333333334, ans=0.025 2023-11-19 17:48:39,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=745533.3333333334, ans=0.0 2023-11-19 17:48:56,732 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 111850 2023-11-19 17:48:57,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=745600.0, ans=0.125 2023-11-19 17:49:13,526 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 3650, loss[loss=0.08784, simple_loss=0.1081, pruned_loss=0.02186, audio_tagging_loss=0.01192, over 16053.00 frames. ], tot_loss[loss=0.08556, simple_loss=0.1049, pruned_loss=0.02288, audio_tagging_loss=0.01024, over 3049728.12 frames. ], batch size: 60, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 17:49:19,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=745733.3333333334, ans=0.035 2023-11-19 17:49:23,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=745733.3333333334, ans=0.035 2023-11-19 17:49:30,315 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.72 vs. limit=15.0 2023-11-19 17:49:31,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=745800.0, ans=0.2 2023-11-19 17:49:33,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=745800.0, ans=0.125 2023-11-19 17:49:36,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=745866.6666666666, ans=0.125 2023-11-19 17:50:01,994 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.46 vs. limit=15.0 2023-11-19 17:50:02,664 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 111900 2023-11-19 17:50:16,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=746066.6666666666, ans=0.125 2023-11-19 17:50:17,560 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 3700, loss[loss=0.079, simple_loss=0.09904, pruned_loss=0.02045, audio_tagging_loss=0.009035, over 15050.00 frames. ], tot_loss[loss=0.08587, simple_loss=0.1052, pruned_loss=0.02303, audio_tagging_loss=0.01022, over 3052497.46 frames. ], batch size: 55, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 17:50:22,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=746066.6666666666, ans=0.0 2023-11-19 17:50:28,232 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2023-11-19 17:50:35,743 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.609e+01 9.395e+01 1.090e+02 1.567e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 17:51:05,356 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 111950 2023-11-19 17:51:10,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=746333.3333333334, ans=0.125 2023-11-19 17:51:20,095 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 3750, loss[loss=0.1014, simple_loss=0.1276, pruned_loss=0.02838, audio_tagging_loss=0.009268, over 15584.00 frames. ], tot_loss[loss=0.0859, simple_loss=0.1052, pruned_loss=0.02303, audio_tagging_loss=0.01025, over 3056385.28 frames. ], batch size: 56, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:51:20,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=746400.0, ans=0.0 2023-11-19 17:51:24,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=746400.0, ans=0.125 2023-11-19 17:51:34,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=746466.6666666666, ans=0.125 2023-11-19 17:51:39,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=746466.6666666666, ans=0.04949747468305833 2023-11-19 17:51:39,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=746466.6666666666, ans=0.1 2023-11-19 17:51:42,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.97 vs. limit=15.0 2023-11-19 17:51:45,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=746533.3333333334, ans=0.1 2023-11-19 17:52:00,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=746600.0, ans=0.125 2023-11-19 17:52:03,706 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:52:08,674 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 112000 2023-11-19 17:52:28,338 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 3800, loss[loss=0.09664, simple_loss=0.115, pruned_loss=0.02872, audio_tagging_loss=0.01041, over 15841.00 frames. ], tot_loss[loss=0.08563, simple_loss=0.1048, pruned_loss=0.02289, audio_tagging_loss=0.01034, over 3059748.60 frames. ], batch size: 57, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:52:41,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=746800.0, ans=0.125 2023-11-19 17:52:42,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=746800.0, ans=0.0 2023-11-19 17:52:46,639 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.263e+01 8.531e+01 9.323e+01 1.047e+02 1.478e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-19 17:52:53,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=746866.6666666666, ans=0.125 2023-11-19 17:52:55,560 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:53:16,984 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 112050 2023-11-19 17:53:20,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=747000.0, ans=0.125 2023-11-19 17:53:25,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=747000.0, ans=0.0 2023-11-19 17:53:31,501 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 3850, loss[loss=0.09339, simple_loss=0.1179, pruned_loss=0.02576, audio_tagging_loss=0.008666, over 14397.00 frames. ], tot_loss[loss=0.08507, simple_loss=0.1039, pruned_loss=0.02263, audio_tagging_loss=0.01051, over 3054666.93 frames. ], batch size: 56, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:53:34,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747066.6666666666, ans=0.1 2023-11-19 17:53:41,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=747066.6666666666, ans=0.04949747468305833 2023-11-19 17:54:08,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747200.0, ans=0.1 2023-11-19 17:54:20,347 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 112100 2023-11-19 17:54:25,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=747333.3333333334, ans=0.0 2023-11-19 17:54:30,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=747333.3333333334, ans=0.125 2023-11-19 17:54:35,298 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 3900, loss[loss=0.09112, simple_loss=0.117, pruned_loss=0.02296, audio_tagging_loss=0.00968, over 15147.00 frames. ], tot_loss[loss=0.0858, simple_loss=0.105, pruned_loss=0.02287, audio_tagging_loss=0.01041, over 3051757.46 frames. ], batch size: 54, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:54:53,385 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.97 vs. limit=15.0 2023-11-19 17:54:55,653 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.123e+01 8.433e+01 9.481e+01 1.017e+02 1.565e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-19 17:55:09,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=747533.3333333334, ans=0.125 2023-11-19 17:55:10,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=747533.3333333334, ans=0.5 2023-11-19 17:55:24,488 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 112150 2023-11-19 17:55:30,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=747666.6666666666, ans=0.02 2023-11-19 17:55:39,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=747733.3333333334, ans=0.0 2023-11-19 17:55:40,534 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 3950, loss[loss=0.09421, simple_loss=0.1164, pruned_loss=0.02744, audio_tagging_loss=0.008555, over 15386.00 frames. ], tot_loss[loss=0.0864, simple_loss=0.1058, pruned_loss=0.02292, audio_tagging_loss=0.01056, over 3053814.27 frames. ], batch size: 57, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:55:52,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=747800.0, ans=0.1 2023-11-19 17:55:52,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.61 vs. limit=10.0 2023-11-19 17:56:14,082 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.60 vs. limit=6.0 2023-11-19 17:56:19,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=747933.3333333334, ans=0.2 2023-11-19 17:56:29,208 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 112200 2023-11-19 17:56:34,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=748000.0, ans=0.0 2023-11-19 17:56:37,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=748000.0, ans=0.125 2023-11-19 17:56:42,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.98 vs. limit=15.0 2023-11-19 17:56:44,796 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 4000, loss[loss=0.08901, simple_loss=0.09597, pruned_loss=0.02932, audio_tagging_loss=0.01171, over 13957.00 frames. ], tot_loss[loss=0.08654, simple_loss=0.1059, pruned_loss=0.02304, audio_tagging_loss=0.01054, over 3053488.97 frames. ], batch size: 54, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:56:45,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=748066.6666666666, ans=0.035 2023-11-19 17:56:51,560 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2023-11-19 17:56:59,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=748133.3333333334, ans=0.0 2023-11-19 17:57:02,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=748133.3333333334, ans=0.125 2023-11-19 17:57:04,930 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.918e+01 8.458e+01 9.188e+01 1.037e+02 1.473e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-19 17:57:22,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=748266.6666666666, ans=0.0 2023-11-19 17:57:24,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=748266.6666666666, ans=0.0 2023-11-19 17:57:29,477 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:57:34,127 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 112250 2023-11-19 17:57:48,616 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 4050, loss[loss=0.1061, simple_loss=0.1333, pruned_loss=0.03083, audio_tagging_loss=0.008598, over 15622.00 frames. ], tot_loss[loss=0.08651, simple_loss=0.1057, pruned_loss=0.02304, audio_tagging_loss=0.0106, over 3052508.00 frames. ], batch size: 56, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 17:57:51,089 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:58:35,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=748600.0, ans=0.025 2023-11-19 17:58:37,373 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 112300 2023-11-19 17:58:38,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=748666.6666666666, ans=0.125 2023-11-19 17:58:49,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=748666.6666666666, ans=0.0 2023-11-19 17:58:52,500 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 4100, loss[loss=0.09334, simple_loss=0.1122, pruned_loss=0.0279, audio_tagging_loss=0.009347, over 15056.00 frames. ], tot_loss[loss=0.08632, simple_loss=0.1054, pruned_loss=0.02293, audio_tagging_loss=0.01072, over 3052175.16 frames. ], batch size: 56, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 17:59:13,892 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.713e+01 8.246e+01 9.038e+01 9.964e+01 1.289e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 17:59:28,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=748866.6666666666, ans=0.2 2023-11-19 17:59:34,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=15.0 2023-11-19 17:59:42,017 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 112350 2023-11-19 17:59:51,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=749000.0, ans=0.2 2023-11-19 17:59:54,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=749000.0, ans=0.2 2023-11-19 17:59:57,408 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 4150, loss[loss=0.09797, simple_loss=0.1202, pruned_loss=0.02851, audio_tagging_loss=0.009332, over 15071.00 frames. ], tot_loss[loss=0.08602, simple_loss=0.1048, pruned_loss=0.02298, audio_tagging_loss=0.01062, over 3049119.52 frames. ], batch size: 56, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 17:59:59,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=749066.6666666666, ans=0.125 2023-11-19 18:00:03,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=749066.6666666666, ans=0.125 2023-11-19 18:00:16,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=749133.3333333334, ans=0.2 2023-11-19 18:00:18,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=749133.3333333334, ans=0.1 2023-11-19 18:00:19,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=749133.3333333334, ans=0.0 2023-11-19 18:00:28,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=749200.0, ans=0.0 2023-11-19 18:00:29,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=749200.0, ans=0.0 2023-11-19 18:00:43,994 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 18:00:46,537 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 112400 2023-11-19 18:01:01,959 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 4200, loss[loss=0.07654, simple_loss=0.09014, pruned_loss=0.01772, audio_tagging_loss=0.01375, over 15137.00 frames. ], tot_loss[loss=0.08592, simple_loss=0.105, pruned_loss=0.02286, audio_tagging_loss=0.01055, over 3051351.59 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 18:01:23,164 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.470e+01 8.967e+01 9.932e+01 1.345e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 18:01:50,888 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 112450 2023-11-19 18:01:54,257 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.35 vs. limit=15.0 2023-11-19 18:02:05,495 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 4250, loss[loss=0.08344, simple_loss=0.1059, pruned_loss=0.02072, audio_tagging_loss=0.00978, over 15348.00 frames. ], tot_loss[loss=0.08627, simple_loss=0.1061, pruned_loss=0.02281, audio_tagging_loss=0.01043, over 3053209.77 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 16.0 2023-11-19 18:02:14,988 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2023-11-19 18:02:23,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=749800.0, ans=0.09899494936611666 2023-11-19 18:02:29,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=749800.0, ans=0.1 2023-11-19 18:02:49,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=749933.3333333334, ans=0.07 2023-11-19 18:02:54,441 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 112500 2023-11-19 18:02:58,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=750000.0, ans=0.0 2023-11-19 18:03:00,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=750000.0, ans=0.125 2023-11-19 18:03:01,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=750000.0, ans=0.035 2023-11-19 18:03:10,415 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 4300, loss[loss=0.09541, simple_loss=0.1204, pruned_loss=0.02585, audio_tagging_loss=0.009352, over 15240.00 frames. ], tot_loss[loss=0.0865, simple_loss=0.1064, pruned_loss=0.02301, audio_tagging_loss=0.01031, over 3054311.19 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 16.0 2023-11-19 18:03:15,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=750066.6666666666, ans=0.0 2023-11-19 18:03:31,886 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.838e+01 9.432e+01 1.009e+02 1.921e+02, threshold=1.886e+02, percent-clipped=1.0 2023-11-19 18:03:48,826 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=12.0 2023-11-19 18:03:59,299 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 112550 2023-11-19 18:04:13,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=750400.0, ans=0.2 2023-11-19 18:04:14,909 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 4350, loss[loss=0.06285, simple_loss=0.06906, pruned_loss=0.01231, audio_tagging_loss=0.01601, over 14843.00 frames. ], tot_loss[loss=0.08624, simple_loss=0.106, pruned_loss=0.02294, audio_tagging_loss=0.01028, over 3059216.29 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 16.0 2023-11-19 18:04:19,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=750400.0, ans=0.125 2023-11-19 18:04:38,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=750466.6666666666, ans=0.125 2023-11-19 18:04:40,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=750533.3333333334, ans=0.0 2023-11-19 18:05:03,492 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 112600 2023-11-19 18:05:03,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=750600.0, ans=0.1 2023-11-19 18:05:13,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=750666.6666666666, ans=0.05 2023-11-19 18:05:16,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=750666.6666666666, ans=0.0 2023-11-19 18:05:17,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=750733.3333333334, ans=0.125 2023-11-19 18:05:18,584 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 4400, loss[loss=0.1273, simple_loss=0.1533, pruned_loss=0.04009, audio_tagging_loss=0.01056, over 16045.00 frames. ], tot_loss[loss=0.08677, simple_loss=0.1067, pruned_loss=0.02325, audio_tagging_loss=0.0102, over 3057470.56 frames. ], batch size: 56, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 18:05:29,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=750733.3333333334, ans=10.0 2023-11-19 18:05:40,674 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.205e+01 8.716e+01 9.862e+01 1.282e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-19 18:05:49,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=750866.6666666666, ans=0.125 2023-11-19 18:05:58,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=750933.3333333334, ans=0.125 2023-11-19 18:06:02,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=750933.3333333334, ans=0.2 2023-11-19 18:06:07,245 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 112650 2023-11-19 18:06:22,896 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.58 vs. limit=22.5 2023-11-19 18:06:23,239 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 4450, loss[loss=0.06534, simple_loss=0.07499, pruned_loss=0.01743, audio_tagging_loss=0.01042, over 15895.00 frames. ], tot_loss[loss=0.08657, simple_loss=0.1063, pruned_loss=0.02326, audio_tagging_loss=0.01018, over 3056440.22 frames. ], batch size: 61, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 18:06:29,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=751066.6666666666, ans=0.125 2023-11-19 18:06:37,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=751133.3333333334, ans=0.2 2023-11-19 18:06:53,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=751200.0, ans=0.125 2023-11-19 18:07:12,244 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 112700 2023-11-19 18:07:12,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=751266.6666666666, ans=0.125 2023-11-19 18:07:17,658 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.66 vs. limit=15.0 2023-11-19 18:07:19,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=751333.3333333334, ans=0.125 2023-11-19 18:07:26,867 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 4500, loss[loss=0.0625, simple_loss=0.086, pruned_loss=0.01219, audio_tagging_loss=0.007311, over 13928.00 frames. ], tot_loss[loss=0.08598, simple_loss=0.1056, pruned_loss=0.02291, audio_tagging_loss=0.01025, over 3059255.42 frames. ], batch size: 52, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 18:07:41,708 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.12 vs. limit=22.5 2023-11-19 18:07:41,887 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=22.5 2023-11-19 18:07:48,355 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.958e+01 8.242e+01 8.889e+01 9.724e+01 1.502e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-19 18:08:15,518 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 112750 2023-11-19 18:08:30,950 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 4550, loss[loss=0.1176, simple_loss=0.1507, pruned_loss=0.03365, audio_tagging_loss=0.00861, over 15024.00 frames. ], tot_loss[loss=0.08536, simple_loss=0.1051, pruned_loss=0.02259, audio_tagging_loss=0.0102, over 3051542.58 frames. ], batch size: 54, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 18:08:48,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=751800.0, ans=0.0 2023-11-19 18:08:53,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=751800.0, ans=0.0 2023-11-19 18:09:03,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=751866.6666666666, ans=0.125 2023-11-19 18:09:05,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=751866.6666666666, ans=0.2 2023-11-19 18:09:08,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=751933.3333333334, ans=15.0 2023-11-19 18:09:14,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=751933.3333333334, ans=0.015 2023-11-19 18:09:14,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=751933.3333333334, ans=0.0 2023-11-19 18:09:19,676 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 18:09:19,769 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 112800 2023-11-19 18:09:30,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=752000.0, ans=0.1 2023-11-19 18:09:36,070 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 4600, loss[loss=0.0735, simple_loss=0.07924, pruned_loss=0.02125, audio_tagging_loss=0.01264, over 14249.00 frames. ], tot_loss[loss=0.08541, simple_loss=0.1047, pruned_loss=0.02273, audio_tagging_loss=0.01033, over 3049419.29 frames. ], batch size: 55, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 18:09:48,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=752133.3333333334, ans=0.125 2023-11-19 18:09:56,968 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.492e+01 8.205e+01 8.855e+01 9.599e+01 1.553e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-19 18:10:07,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=752200.0, ans=0.125 2023-11-19 18:10:24,861 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 112850 2023-11-19 18:10:28,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=752333.3333333334, ans=0.125 2023-11-19 18:10:37,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=752333.3333333334, ans=0.0 2023-11-19 18:10:39,522 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 4650, loss[loss=0.07676, simple_loss=0.09142, pruned_loss=0.01822, audio_tagging_loss=0.01283, over 16031.00 frames. ], tot_loss[loss=0.08612, simple_loss=0.1056, pruned_loss=0.02296, audio_tagging_loss=0.01037, over 3050851.61 frames. ], batch size: 61, lr: 6.98e-03, grad_scale: 16.0 2023-11-19 18:10:49,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=752400.0, ans=0.0 2023-11-19 18:11:16,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=752533.3333333334, ans=0.1 2023-11-19 18:11:25,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=752600.0, ans=0.1 2023-11-19 18:11:26,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=752600.0, ans=0.0 2023-11-19 18:11:28,744 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 112900 2023-11-19 18:11:32,930 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2023-11-19 18:11:43,074 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 4700, loss[loss=0.1152, simple_loss=0.1343, pruned_loss=0.038, audio_tagging_loss=0.01003, over 15348.00 frames. ], tot_loss[loss=0.08719, simple_loss=0.1065, pruned_loss=0.02341, audio_tagging_loss=0.01051, over 3058035.98 frames. ], batch size: 58, lr: 6.97e-03, grad_scale: 16.0 2023-11-19 18:11:51,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.95 vs. limit=15.0 2023-11-19 18:11:57,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=752800.0, ans=15.0 2023-11-19 18:12:06,754 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.509e+01 9.287e+01 1.024e+02 1.353e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-19 18:12:11,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.78 vs. limit=12.0 2023-11-19 18:12:31,872 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 112950 2023-11-19 18:12:49,176 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 4750, loss[loss=0.07435, simple_loss=0.08084, pruned_loss=0.02026, audio_tagging_loss=0.01368, over 14164.00 frames. ], tot_loss[loss=0.08635, simple_loss=0.1051, pruned_loss=0.02313, audio_tagging_loss=0.01066, over 3054956.04 frames. ], batch size: 56, lr: 6.97e-03, grad_scale: 16.0 2023-11-19 18:12:55,966 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=12.0 2023-11-19 18:13:18,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=753200.0, ans=0.2 2023-11-19 18:13:37,916 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 113000 2023-11-19 18:13:46,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=753333.3333333334, ans=0.0 2023-11-19 18:13:53,029 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 4800, loss[loss=0.08204, simple_loss=0.1081, pruned_loss=0.01912, audio_tagging_loss=0.008875, over 15959.00 frames. ], tot_loss[loss=0.08625, simple_loss=0.1049, pruned_loss=0.02302, audio_tagging_loss=0.01079, over 3058609.83 frames. ], batch size: 59, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 18:13:53,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=753400.0, ans=0.0 2023-11-19 18:13:58,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=753400.0, ans=0.05 2023-11-19 18:14:09,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=753466.6666666666, ans=0.2 2023-11-19 18:14:12,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=753466.6666666666, ans=0.0 2023-11-19 18:14:15,228 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.505e+01 9.415e+01 1.014e+02 1.501e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-19 18:14:34,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=753600.0, ans=0.0 2023-11-19 18:14:41,518 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 113050 2023-11-19 18:14:56,003 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 4850, loss[loss=0.09795, simple_loss=0.1267, pruned_loss=0.02529, audio_tagging_loss=0.009312, over 15318.00 frames. ], tot_loss[loss=0.08672, simple_loss=0.1054, pruned_loss=0.02314, audio_tagging_loss=0.01088, over 3055595.86 frames. ], batch size: 55, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 18:15:05,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=753733.3333333334, ans=0.0 2023-11-19 18:15:28,247 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.75 vs. limit=10.0 2023-11-19 18:15:44,957 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 113100 2023-11-19 18:16:01,374 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 4900, loss[loss=0.0879, simple_loss=0.1091, pruned_loss=0.02356, audio_tagging_loss=0.009798, over 14230.00 frames. ], tot_loss[loss=0.08628, simple_loss=0.1051, pruned_loss=0.02299, audio_tagging_loss=0.01074, over 3052472.19 frames. ], batch size: 54, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 18:16:23,307 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.624e+01 8.196e+01 8.687e+01 9.230e+01 1.120e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-19 18:16:28,993 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.20 vs. limit=15.0 2023-11-19 18:16:49,913 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 113150 2023-11-19 18:16:53,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=754333.3333333334, ans=0.1 2023-11-19 18:16:54,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=754333.3333333334, ans=0.125 2023-11-19 18:17:04,374 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 4950, loss[loss=0.07513, simple_loss=0.09361, pruned_loss=0.01854, audio_tagging_loss=0.009785, over 15088.00 frames. ], tot_loss[loss=0.08579, simple_loss=0.105, pruned_loss=0.02282, audio_tagging_loss=0.01048, over 3051832.63 frames. ], batch size: 58, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 18:17:05,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=754400.0, ans=0.125 2023-11-19 18:17:13,735 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.58 vs. limit=15.0 2023-11-19 18:17:15,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.51 vs. limit=22.5 2023-11-19 18:17:20,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=754466.6666666666, ans=0.0 2023-11-19 18:17:25,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=754466.6666666666, ans=0.125 2023-11-19 18:17:48,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=754600.0, ans=0.125 2023-11-19 18:17:52,777 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 113200 2023-11-19 18:17:58,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=754666.6666666666, ans=0.125 2023-11-19 18:18:01,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=754666.6666666666, ans=0.1 2023-11-19 18:18:06,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=754733.3333333334, ans=0.125 2023-11-19 18:18:07,782 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 5000, loss[loss=0.06246, simple_loss=0.07333, pruned_loss=0.01605, audio_tagging_loss=0.009743, over 15545.00 frames. ], tot_loss[loss=0.08586, simple_loss=0.1054, pruned_loss=0.02283, audio_tagging_loss=0.01034, over 3057696.39 frames. ], batch size: 60, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 18:18:31,430 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.773e+01 8.065e+01 8.852e+01 9.668e+01 1.212e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-19 18:18:31,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=754800.0, ans=0.125 2023-11-19 18:18:33,501 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2023-11-19 18:18:55,831 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 113250 2023-11-19 18:18:59,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=755000.0, ans=0.125 2023-11-19 18:19:11,136 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 5050, loss[loss=0.1006, simple_loss=0.1218, pruned_loss=0.02938, audio_tagging_loss=0.01031, over 15011.00 frames. ], tot_loss[loss=0.08594, simple_loss=0.106, pruned_loss=0.02278, audio_tagging_loss=0.01015, over 3055662.02 frames. ], batch size: 55, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:19:17,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=755066.6666666666, ans=0.1 2023-11-19 18:19:20,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=755066.6666666666, ans=0.2 2023-11-19 18:19:24,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=755133.3333333334, ans=0.025 2023-11-19 18:19:29,640 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2023-11-19 18:19:46,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=755200.0, ans=0.125 2023-11-19 18:19:48,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=755266.6666666666, ans=0.0 2023-11-19 18:19:50,331 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2023-11-19 18:20:00,140 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 113300 2023-11-19 18:20:15,875 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 5100, loss[loss=0.0578, simple_loss=0.06566, pruned_loss=0.0128, audio_tagging_loss=0.01217, over 14972.00 frames. ], tot_loss[loss=0.08554, simple_loss=0.1051, pruned_loss=0.02276, audio_tagging_loss=0.01022, over 3050188.49 frames. ], batch size: 58, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:20:16,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.17 vs. limit=15.0 2023-11-19 18:20:20,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=755400.0, ans=15.0 2023-11-19 18:20:32,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=755466.6666666666, ans=0.1 2023-11-19 18:20:36,231 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2023-11-19 18:20:36,414 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.88 vs. limit=22.5 2023-11-19 18:20:38,105 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.347e+01 7.917e+01 8.813e+01 9.876e+01 1.323e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-19 18:20:38,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=755466.6666666666, ans=0.0 2023-11-19 18:21:00,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=755600.0, ans=0.0 2023-11-19 18:21:02,841 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.37 vs. limit=15.0 2023-11-19 18:21:04,709 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 113350 2023-11-19 18:21:07,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=755666.6666666666, ans=0.035 2023-11-19 18:21:19,257 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 5150, loss[loss=0.07963, simple_loss=0.1009, pruned_loss=0.02064, audio_tagging_loss=0.008544, over 15025.00 frames. ], tot_loss[loss=0.08531, simple_loss=0.1048, pruned_loss=0.02265, audio_tagging_loss=0.01025, over 3046303.76 frames. ], batch size: 55, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:21:29,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=755733.3333333334, ans=0.0 2023-11-19 18:21:45,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755866.6666666666, ans=0.1 2023-11-19 18:21:46,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=755866.6666666666, ans=0.125 2023-11-19 18:21:46,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=755866.6666666666, ans=0.125 2023-11-19 18:22:07,689 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 113400 2023-11-19 18:22:12,341 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=12.0 2023-11-19 18:22:13,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=756000.0, ans=0.0 2023-11-19 18:22:22,829 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 5200, loss[loss=0.07547, simple_loss=0.09868, pruned_loss=0.01754, audio_tagging_loss=0.008582, over 15701.00 frames. ], tot_loss[loss=0.08574, simple_loss=0.105, pruned_loss=0.02295, audio_tagging_loss=0.01027, over 3050140.34 frames. ], batch size: 60, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:22:23,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=756066.6666666666, ans=0.125 2023-11-19 18:22:25,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=756066.6666666666, ans=0.025 2023-11-19 18:22:31,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=756066.6666666666, ans=0.1 2023-11-19 18:22:38,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=756133.3333333334, ans=0.1 2023-11-19 18:22:40,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=756133.3333333334, ans=0.125 2023-11-19 18:22:41,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=756133.3333333334, ans=0.1 2023-11-19 18:22:46,625 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.196e+01 8.495e+01 9.298e+01 1.017e+02 1.203e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 18:22:50,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=756200.0, ans=0.2 2023-11-19 18:23:00,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=756266.6666666666, ans=0.0 2023-11-19 18:23:06,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=756266.6666666666, ans=0.2 2023-11-19 18:23:11,182 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 113450 2023-11-19 18:23:19,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=756333.3333333334, ans=0.125 2023-11-19 18:23:22,618 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=12.0 2023-11-19 18:23:27,292 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 5250, loss[loss=0.0592, simple_loss=0.0686, pruned_loss=0.01156, audio_tagging_loss=0.01335, over 14564.00 frames. ], tot_loss[loss=0.08544, simple_loss=0.105, pruned_loss=0.02266, audio_tagging_loss=0.0103, over 3051059.62 frames. ], batch size: 56, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:23:45,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=756466.6666666666, ans=10.0 2023-11-19 18:23:45,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=756466.6666666666, ans=0.2 2023-11-19 18:23:55,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=756533.3333333334, ans=0.2 2023-11-19 18:23:57,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=756533.3333333334, ans=0.0 2023-11-19 18:24:14,548 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 113500 2023-11-19 18:24:19,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-19 18:24:20,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=756666.6666666666, ans=0.0 2023-11-19 18:24:25,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=756666.6666666666, ans=0.125 2023-11-19 18:24:27,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=756666.6666666666, ans=0.0 2023-11-19 18:24:29,871 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 5300, loss[loss=0.08096, simple_loss=0.09942, pruned_loss=0.01956, audio_tagging_loss=0.01169, over 15499.00 frames. ], tot_loss[loss=0.08534, simple_loss=0.1046, pruned_loss=0.02267, audio_tagging_loss=0.01035, over 3054552.54 frames. ], batch size: 57, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:24:50,645 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=12.0 2023-11-19 18:24:51,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=756800.0, ans=0.2 2023-11-19 18:24:53,117 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 8.322e+01 9.046e+01 9.978e+01 1.366e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-19 18:25:01,208 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:25:08,805 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.54 vs. limit=22.5 2023-11-19 18:25:19,305 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 113550 2023-11-19 18:25:26,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=757000.0, ans=0.1 2023-11-19 18:25:33,898 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 5350, loss[loss=0.1148, simple_loss=0.1385, pruned_loss=0.03682, audio_tagging_loss=0.008722, over 14569.00 frames. ], tot_loss[loss=0.08548, simple_loss=0.1047, pruned_loss=0.02273, audio_tagging_loss=0.0104, over 3049946.72 frames. ], batch size: 55, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 18:25:42,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2023-11-19 18:25:44,353 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=15.0 2023-11-19 18:25:50,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=757133.3333333334, ans=0.0 2023-11-19 18:25:52,950 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.73 vs. limit=15.0 2023-11-19 18:25:53,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=757133.3333333334, ans=0.125 2023-11-19 18:26:22,797 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 113600 2023-11-19 18:26:24,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=757333.3333333334, ans=0.0 2023-11-19 18:26:27,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=757333.3333333334, ans=0.125 2023-11-19 18:26:31,781 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.93 vs. limit=15.0 2023-11-19 18:26:33,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=757333.3333333334, ans=0.0 2023-11-19 18:26:39,008 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 5400, loss[loss=0.07616, simple_loss=0.09677, pruned_loss=0.01836, audio_tagging_loss=0.009411, over 15475.00 frames. ], tot_loss[loss=0.08514, simple_loss=0.1044, pruned_loss=0.02246, audio_tagging_loss=0.01045, over 3051017.17 frames. ], batch size: 56, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 18:26:46,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=757400.0, ans=0.0 2023-11-19 18:26:56,720 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.66 vs. limit=15.0 2023-11-19 18:27:02,057 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.928e+01 8.152e+01 8.655e+01 9.837e+01 1.259e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-19 18:27:10,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=757533.3333333334, ans=0.125 2023-11-19 18:27:14,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=757533.3333333334, ans=0.1 2023-11-19 18:27:27,986 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 113650 2023-11-19 18:27:35,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=757666.6666666666, ans=0.125 2023-11-19 18:27:43,260 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 5450, loss[loss=0.1012, simple_loss=0.1258, pruned_loss=0.02851, audio_tagging_loss=0.009732, over 15007.00 frames. ], tot_loss[loss=0.08638, simple_loss=0.1058, pruned_loss=0.02305, audio_tagging_loss=0.01043, over 3050801.53 frames. ], batch size: 56, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 18:27:48,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=757733.3333333334, ans=0.0 2023-11-19 18:27:55,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=757800.0, ans=0.125 2023-11-19 18:28:07,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=757866.6666666666, ans=0.125 2023-11-19 18:28:10,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=757866.6666666666, ans=0.0 2023-11-19 18:28:13,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=757866.6666666666, ans=0.1 2023-11-19 18:28:15,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=757866.6666666666, ans=0.0 2023-11-19 18:28:31,998 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 113700 2023-11-19 18:28:41,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=758000.0, ans=0.125 2023-11-19 18:28:46,397 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 5500, loss[loss=0.06544, simple_loss=0.08002, pruned_loss=0.0141, audio_tagging_loss=0.01133, over 15125.00 frames. ], tot_loss[loss=0.08615, simple_loss=0.1056, pruned_loss=0.02289, audio_tagging_loss=0.01046, over 3052560.07 frames. ], batch size: 57, lr: 6.95e-03, grad_scale: 16.0 2023-11-19 18:28:47,099 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.45 vs. limit=15.0 2023-11-19 18:29:01,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=758133.3333333334, ans=0.125 2023-11-19 18:29:10,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=758133.3333333334, ans=0.0 2023-11-19 18:29:10,971 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.752e+01 8.267e+01 8.902e+01 9.734e+01 1.914e+02, threshold=1.780e+02, percent-clipped=1.0 2023-11-19 18:29:11,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=758200.0, ans=0.125 2023-11-19 18:29:33,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=758266.6666666666, ans=0.0 2023-11-19 18:29:34,947 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 113750 2023-11-19 18:29:51,177 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 5550, loss[loss=0.08891, simple_loss=0.108, pruned_loss=0.02279, audio_tagging_loss=0.0121, over 16953.00 frames. ], tot_loss[loss=0.08602, simple_loss=0.1052, pruned_loss=0.02293, audio_tagging_loss=0.01051, over 3054927.99 frames. ], batch size: 62, lr: 6.95e-03, grad_scale: 16.0 2023-11-19 18:30:07,781 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.025e-01 2023-11-19 18:30:09,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=758466.6666666666, ans=0.0 2023-11-19 18:30:16,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=758533.3333333334, ans=0.0 2023-11-19 18:30:16,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=758533.3333333334, ans=0.125 2023-11-19 18:30:18,880 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2023-11-19 18:30:20,866 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:30:23,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=758533.3333333334, ans=0.125 2023-11-19 18:30:39,382 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 113800 2023-11-19 18:30:42,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=758666.6666666666, ans=0.2 2023-11-19 18:30:54,404 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 5600, loss[loss=0.09929, simple_loss=0.1237, pruned_loss=0.02881, audio_tagging_loss=0.008653, over 15099.00 frames. ], tot_loss[loss=0.08622, simple_loss=0.1057, pruned_loss=0.02282, audio_tagging_loss=0.01053, over 3063395.93 frames. ], batch size: 54, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 18:30:59,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=758733.3333333334, ans=0.0 2023-11-19 18:31:10,797 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2023-11-19 18:31:12,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=758800.0, ans=0.025 2023-11-19 18:31:18,675 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.485e+01 9.378e+01 1.023e+02 2.129e+02, threshold=1.876e+02, percent-clipped=1.0 2023-11-19 18:31:25,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=758866.6666666666, ans=0.0 2023-11-19 18:31:31,919 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:31:34,808 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.88 vs. limit=6.0 2023-11-19 18:31:35,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=758933.3333333334, ans=0.0 2023-11-19 18:31:39,025 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 18:31:43,978 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 113850 2023-11-19 18:31:48,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=759000.0, ans=0.1 2023-11-19 18:31:57,392 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.23 vs. limit=10.0 2023-11-19 18:31:59,326 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 5650, loss[loss=0.109, simple_loss=0.1418, pruned_loss=0.02931, audio_tagging_loss=0.008796, over 15096.00 frames. ], tot_loss[loss=0.08674, simple_loss=0.1065, pruned_loss=0.02301, audio_tagging_loss=0.01049, over 3068335.60 frames. ], batch size: 53, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 18:32:48,341 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 113900 2023-11-19 18:32:57,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=759333.3333333334, ans=0.125 2023-11-19 18:33:04,057 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 5700, loss[loss=0.0884, simple_loss=0.1126, pruned_loss=0.02432, audio_tagging_loss=0.007768, over 14805.00 frames. ], tot_loss[loss=0.08678, simple_loss=0.1065, pruned_loss=0.02305, audio_tagging_loss=0.0105, over 3062439.23 frames. ], batch size: 55, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:33:12,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=759400.0, ans=0.2 2023-11-19 18:33:26,503 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.49 vs. limit=15.0 2023-11-19 18:33:29,968 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.506e+01 8.509e+01 9.324e+01 1.031e+02 1.317e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-19 18:33:53,309 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 113950 2023-11-19 18:34:08,413 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 5750, loss[loss=0.06507, simple_loss=0.07082, pruned_loss=0.01516, audio_tagging_loss=0.0145, over 15166.00 frames. ], tot_loss[loss=0.08634, simple_loss=0.1058, pruned_loss=0.02289, audio_tagging_loss=0.01054, over 3060471.88 frames. ], batch size: 56, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:34:10,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=759733.3333333334, ans=0.0 2023-11-19 18:34:11,697 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=15.0 2023-11-19 18:34:17,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=759733.3333333334, ans=0.2 2023-11-19 18:34:36,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=759866.6666666666, ans=0.125 2023-11-19 18:34:46,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=759933.3333333334, ans=0.125 2023-11-19 18:34:55,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=759933.3333333334, ans=0.0 2023-11-19 18:34:57,384 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 114000 2023-11-19 18:35:01,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=760000.0, ans=0.1 2023-11-19 18:35:07,897 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-11-19 18:35:12,658 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 5800, loss[loss=0.1096, simple_loss=0.1316, pruned_loss=0.03518, audio_tagging_loss=0.008577, over 15122.00 frames. ], tot_loss[loss=0.08567, simple_loss=0.1051, pruned_loss=0.02272, audio_tagging_loss=0.01042, over 3062669.49 frames. ], batch size: 56, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:35:28,233 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.72 vs. limit=15.0 2023-11-19 18:35:31,553 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2023-11-19 18:35:35,934 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.98 vs. limit=22.5 2023-11-19 18:35:38,833 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.740e+01 8.284e+01 9.012e+01 9.674e+01 1.297e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-19 18:35:44,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=760200.0, ans=0.04949747468305833 2023-11-19 18:35:59,841 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2023-11-19 18:36:02,347 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 114050 2023-11-19 18:36:06,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=760333.3333333334, ans=0.0 2023-11-19 18:36:06,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=760333.3333333334, ans=0.125 2023-11-19 18:36:07,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=760333.3333333334, ans=0.125 2023-11-19 18:36:12,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=760333.3333333334, ans=0.2 2023-11-19 18:36:17,875 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 5850, loss[loss=0.1049, simple_loss=0.1191, pruned_loss=0.03717, audio_tagging_loss=0.008192, over 14778.00 frames. ], tot_loss[loss=0.08493, simple_loss=0.104, pruned_loss=0.02258, audio_tagging_loss=0.01037, over 3044225.45 frames. ], batch size: 55, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:36:44,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=760533.3333333334, ans=0.1 2023-11-19 18:37:07,042 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 114100 2023-11-19 18:37:18,076 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=22.5 2023-11-19 18:37:22,314 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 5900, loss[loss=0.0754, simple_loss=0.08588, pruned_loss=0.02037, audio_tagging_loss=0.0121, over 16618.00 frames. ], tot_loss[loss=0.08603, simple_loss=0.1056, pruned_loss=0.02293, audio_tagging_loss=0.01029, over 3052725.97 frames. ], batch size: 65, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:37:47,444 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.904e+01 8.348e+01 9.268e+01 1.091e+02 1.395e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-19 18:38:11,878 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 114150 2023-11-19 18:38:15,710 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:38:23,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=761000.0, ans=0.125 2023-11-19 18:38:26,627 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 5950, loss[loss=0.1053, simple_loss=0.1431, pruned_loss=0.02731, audio_tagging_loss=0.006418, over 15300.00 frames. ], tot_loss[loss=0.08551, simple_loss=0.1052, pruned_loss=0.02264, audio_tagging_loss=0.01025, over 3046757.20 frames. ], batch size: 52, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:38:32,039 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.01 vs. limit=6.0 2023-11-19 18:38:40,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=761133.3333333334, ans=0.0 2023-11-19 18:38:42,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=761133.3333333334, ans=0.125 2023-11-19 18:38:43,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=761133.3333333334, ans=0.125 2023-11-19 18:38:44,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=761133.3333333334, ans=0.2 2023-11-19 18:38:48,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=761133.3333333334, ans=0.125 2023-11-19 18:38:53,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=761200.0, ans=0.125 2023-11-19 18:39:03,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=761200.0, ans=0.125 2023-11-19 18:39:15,896 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 114200 2023-11-19 18:39:21,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=761333.3333333334, ans=0.1 2023-11-19 18:39:31,896 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 6000, loss[loss=0.06886, simple_loss=0.08019, pruned_loss=0.01787, audio_tagging_loss=0.0109, over 15478.00 frames. ], tot_loss[loss=0.08511, simple_loss=0.1047, pruned_loss=0.02244, audio_tagging_loss=0.0103, over 3046866.72 frames. ], batch size: 60, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:39:31,897 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-19 18:40:12,632 INFO [train_asr.py:1294] (3/4) Epoch 10, validation: loss=0.06357, simple_loss=0.05534, pruned_loss=0.006382, audio_tagging_loss=0.02952, over 4681554.00 frames. 2023-11-19 18:40:12,633 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-19 18:40:15,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=761400.0, ans=0.2 2023-11-19 18:40:26,946 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=15.0 2023-11-19 18:40:30,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=761466.6666666666, ans=0.0 2023-11-19 18:40:38,556 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.626e+01 8.242e+01 9.055e+01 9.883e+01 1.211e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 18:40:52,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=761600.0, ans=0.125 2023-11-19 18:40:58,303 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 18:41:02,046 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 114250 2023-11-19 18:41:14,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=761666.6666666666, ans=0.0 2023-11-19 18:41:17,092 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 6050, loss[loss=0.05948, simple_loss=0.0673, pruned_loss=0.01441, audio_tagging_loss=0.01143, over 15686.00 frames. ], tot_loss[loss=0.08539, simple_loss=0.1052, pruned_loss=0.02257, audio_tagging_loss=0.01024, over 3046152.71 frames. ], batch size: 61, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:41:17,336 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:41:22,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=761733.3333333334, ans=0.2 2023-11-19 18:41:27,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=761733.3333333334, ans=0.1 2023-11-19 18:41:27,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=761733.3333333334, ans=0.1 2023-11-19 18:41:40,256 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.05 vs. limit=8.0 2023-11-19 18:42:06,887 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 114300 2023-11-19 18:42:23,025 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 6100, loss[loss=0.0773, simple_loss=0.09267, pruned_loss=0.01913, audio_tagging_loss=0.01184, over 14314.00 frames. ], tot_loss[loss=0.08595, simple_loss=0.1058, pruned_loss=0.02288, audio_tagging_loss=0.01017, over 3053639.84 frames. ], batch size: 54, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:42:27,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=762066.6666666666, ans=0.125 2023-11-19 18:42:30,850 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.88 vs. limit=15.0 2023-11-19 18:42:46,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=762133.3333333334, ans=0.2 2023-11-19 18:42:48,699 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.508e+01 9.449e+01 1.032e+02 1.447e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-19 18:42:55,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=762200.0, ans=0.125 2023-11-19 18:43:07,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=762266.6666666666, ans=0.125 2023-11-19 18:43:13,164 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 114350 2023-11-19 18:43:28,441 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 6150, loss[loss=0.09083, simple_loss=0.1245, pruned_loss=0.02207, audio_tagging_loss=0.006532, over 15636.00 frames. ], tot_loss[loss=0.08466, simple_loss=0.1037, pruned_loss=0.02246, audio_tagging_loss=0.01032, over 3048603.26 frames. ], batch size: 57, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:43:34,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=762400.0, ans=0.0 2023-11-19 18:43:55,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=762533.3333333334, ans=0.0 2023-11-19 18:43:55,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=762533.3333333334, ans=0.125 2023-11-19 18:43:56,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=762533.3333333334, ans=0.0 2023-11-19 18:44:18,108 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 114400 2023-11-19 18:44:33,349 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 6200, loss[loss=0.08249, simple_loss=0.1208, pruned_loss=0.01444, audio_tagging_loss=0.007658, over 15134.00 frames. ], tot_loss[loss=0.08429, simple_loss=0.103, pruned_loss=0.02237, audio_tagging_loss=0.01041, over 3037352.58 frames. ], batch size: 53, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:44:36,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=762733.3333333334, ans=0.2 2023-11-19 18:44:45,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=762800.0, ans=0.125 2023-11-19 18:44:45,959 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.48 vs. limit=15.0 2023-11-19 18:45:00,750 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.346e+01 9.010e+01 9.734e+01 1.303e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-19 18:45:23,153 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 114450 2023-11-19 18:45:39,192 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 6250, loss[loss=0.1071, simple_loss=0.1287, pruned_loss=0.03413, audio_tagging_loss=0.008627, over 14379.00 frames. ], tot_loss[loss=0.0854, simple_loss=0.1041, pruned_loss=0.02283, audio_tagging_loss=0.01051, over 3044197.40 frames. ], batch size: 53, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:45:59,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=763133.3333333334, ans=0.2 2023-11-19 18:46:06,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=763200.0, ans=0.2 2023-11-19 18:46:14,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=763200.0, ans=10.0 2023-11-19 18:46:16,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=763266.6666666666, ans=0.0 2023-11-19 18:46:28,991 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 114500 2023-11-19 18:46:34,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=763333.3333333334, ans=0.0 2023-11-19 18:46:45,260 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 6300, loss[loss=0.07043, simple_loss=0.07347, pruned_loss=0.01773, audio_tagging_loss=0.01596, over 14806.00 frames. ], tot_loss[loss=0.08503, simple_loss=0.1039, pruned_loss=0.02257, audio_tagging_loss=0.01052, over 3039634.68 frames. ], batch size: 58, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:46:51,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=763400.0, ans=0.0 2023-11-19 18:47:01,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=763466.6666666666, ans=0.125 2023-11-19 18:47:09,917 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.385e+01 9.179e+01 1.044e+02 1.360e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-19 18:47:20,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=763533.3333333334, ans=0.1 2023-11-19 18:47:34,913 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 114550 2023-11-19 18:47:49,874 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 6350, loss[loss=0.05513, simple_loss=0.05177, pruned_loss=0.01553, audio_tagging_loss=0.01372, over 17793.00 frames. ], tot_loss[loss=0.08435, simple_loss=0.1029, pruned_loss=0.02232, audio_tagging_loss=0.01056, over 3042089.28 frames. ], batch size: 70, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:48:03,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=763800.0, ans=0.1 2023-11-19 18:48:38,865 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 114600 2023-11-19 18:48:54,857 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 6400, loss[loss=0.09195, simple_loss=0.1067, pruned_loss=0.02644, audio_tagging_loss=0.01214, over 14537.00 frames. ], tot_loss[loss=0.08428, simple_loss=0.1027, pruned_loss=0.02225, audio_tagging_loss=0.01069, over 3037764.16 frames. ], batch size: 57, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:48:55,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=764066.6666666666, ans=0.0 2023-11-19 18:49:21,755 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.802e+01 8.099e+01 8.680e+01 9.158e+01 1.578e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-19 18:49:44,865 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 114650 2023-11-19 18:49:49,784 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.93 vs. limit=6.0 2023-11-19 18:50:01,277 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 6450, loss[loss=0.08647, simple_loss=0.1119, pruned_loss=0.02139, audio_tagging_loss=0.009149, over 15581.00 frames. ], tot_loss[loss=0.08445, simple_loss=0.1031, pruned_loss=0.02222, audio_tagging_loss=0.01069, over 3040531.41 frames. ], batch size: 59, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:50:12,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=764466.6666666666, ans=0.0 2023-11-19 18:50:17,997 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:50:19,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=764466.6666666666, ans=0.125 2023-11-19 18:50:21,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=764466.6666666666, ans=0.1 2023-11-19 18:50:22,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=764466.6666666666, ans=0.125 2023-11-19 18:50:50,145 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 114700 2023-11-19 18:51:04,052 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2023-11-19 18:51:05,825 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 6500, loss[loss=0.07132, simple_loss=0.08805, pruned_loss=0.01806, audio_tagging_loss=0.009239, over 15296.00 frames. ], tot_loss[loss=0.08415, simple_loss=0.1028, pruned_loss=0.02211, audio_tagging_loss=0.01062, over 3042870.84 frames. ], batch size: 60, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:51:32,090 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.963e+01 8.435e+01 9.152e+01 1.009e+02 1.379e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 18:51:50,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=764933.3333333334, ans=0.2 2023-11-19 18:51:56,216 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 114750 2023-11-19 18:52:05,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=765000.0, ans=0.09899494936611666 2023-11-19 18:52:06,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=765000.0, ans=0.125 2023-11-19 18:52:09,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.94 vs. limit=12.0 2023-11-19 18:52:11,131 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 6550, loss[loss=0.1009, simple_loss=0.126, pruned_loss=0.02806, audio_tagging_loss=0.009859, over 14459.00 frames. ], tot_loss[loss=0.08533, simple_loss=0.1047, pruned_loss=0.02257, audio_tagging_loss=0.01039, over 3048087.39 frames. ], batch size: 54, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:52:21,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=765066.6666666666, ans=0.5 2023-11-19 18:52:26,166 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2023-11-19 18:52:33,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.16 vs. limit=22.5 2023-11-19 18:52:56,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=765266.6666666666, ans=0.0 2023-11-19 18:53:00,872 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 114800 2023-11-19 18:53:17,407 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 6600, loss[loss=0.09621, simple_loss=0.1209, pruned_loss=0.02813, audio_tagging_loss=0.007625, over 14204.00 frames. ], tot_loss[loss=0.08556, simple_loss=0.1051, pruned_loss=0.02274, audio_tagging_loss=0.01027, over 3045583.42 frames. ], batch size: 54, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:53:42,953 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.143e+01 8.495e+01 9.015e+01 9.763e+01 1.318e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-19 18:53:47,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=765533.3333333334, ans=0.0 2023-11-19 18:53:48,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=765533.3333333334, ans=0.1 2023-11-19 18:53:58,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=765600.0, ans=0.125 2023-11-19 18:54:07,120 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 114850 2023-11-19 18:54:07,652 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.90 vs. limit=22.5 2023-11-19 18:54:22,826 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 6650, loss[loss=0.08898, simple_loss=0.1132, pruned_loss=0.02439, audio_tagging_loss=0.008009, over 16054.00 frames. ], tot_loss[loss=0.08584, simple_loss=0.1052, pruned_loss=0.02295, audio_tagging_loss=0.0103, over 3039643.80 frames. ], batch size: 60, lr: 6.92e-03, grad_scale: 16.0 2023-11-19 18:54:51,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=765866.6666666666, ans=0.125 2023-11-19 18:54:59,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=765866.6666666666, ans=0.0 2023-11-19 18:55:02,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=765933.3333333334, ans=0.2 2023-11-19 18:55:12,758 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 114900 2023-11-19 18:55:13,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=765933.3333333334, ans=0.125 2023-11-19 18:55:27,689 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 6700, loss[loss=0.08895, simple_loss=0.1062, pruned_loss=0.02454, audio_tagging_loss=0.01132, over 15443.00 frames. ], tot_loss[loss=0.08508, simple_loss=0.1041, pruned_loss=0.02259, audio_tagging_loss=0.01043, over 3041629.77 frames. ], batch size: 58, lr: 6.91e-03, grad_scale: 16.0 2023-11-19 18:55:31,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=766066.6666666666, ans=0.125 2023-11-19 18:55:33,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=766066.6666666666, ans=0.125 2023-11-19 18:55:39,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=766066.6666666666, ans=0.0 2023-11-19 18:55:55,862 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.722e+01 8.369e+01 9.025e+01 9.789e+01 1.375e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 18:56:17,645 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 114950 2023-11-19 18:56:34,539 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 6750, loss[loss=0.0602, simple_loss=0.07789, pruned_loss=0.01256, audio_tagging_loss=0.008695, over 14824.00 frames. ], tot_loss[loss=0.08525, simple_loss=0.1045, pruned_loss=0.02262, audio_tagging_loss=0.01039, over 3037310.19 frames. ], batch size: 54, lr: 6.91e-03, grad_scale: 16.0 2023-11-19 18:57:24,329 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 115000 2023-11-19 18:57:39,612 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 6800, loss[loss=0.07968, simple_loss=0.09318, pruned_loss=0.02179, audio_tagging_loss=0.0113, over 15051.00 frames. ], tot_loss[loss=0.08511, simple_loss=0.1045, pruned_loss=0.02259, audio_tagging_loss=0.01027, over 3042923.41 frames. ], batch size: 56, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 18:57:46,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=766733.3333333334, ans=0.125 2023-11-19 18:57:50,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=766733.3333333334, ans=0.07 2023-11-19 18:58:07,416 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.734e+01 8.232e+01 9.131e+01 9.667e+01 1.376e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-19 18:58:11,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=766866.6666666666, ans=0.2 2023-11-19 18:58:19,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=766933.3333333334, ans=0.1 2023-11-19 18:58:29,255 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 115050 2023-11-19 18:58:29,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=766933.3333333334, ans=0.125 2023-11-19 18:58:34,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=767000.0, ans=0.1 2023-11-19 18:58:44,805 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 6850, loss[loss=0.09199, simple_loss=0.1108, pruned_loss=0.02636, audio_tagging_loss=0.01021, over 15562.00 frames. ], tot_loss[loss=0.08566, simple_loss=0.1054, pruned_loss=0.02274, audio_tagging_loss=0.01023, over 3034538.29 frames. ], batch size: 59, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 18:59:34,648 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 115100 2023-11-19 18:59:47,403 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.96 vs. limit=15.0 2023-11-19 18:59:50,474 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 6900, loss[loss=0.07716, simple_loss=0.09238, pruned_loss=0.02147, audio_tagging_loss=0.009501, over 15071.00 frames. ], tot_loss[loss=0.08571, simple_loss=0.1055, pruned_loss=0.02278, audio_tagging_loss=0.01017, over 3034858.91 frames. ], batch size: 56, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 18:59:56,270 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 19:00:00,357 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.54 vs. limit=10.0 2023-11-19 19:00:01,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=767400.0, ans=0.2 2023-11-19 19:00:17,008 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.004e+01 8.111e+01 8.753e+01 9.460e+01 1.253e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-19 19:00:22,324 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.00 vs. limit=15.0 2023-11-19 19:00:25,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=767533.3333333334, ans=0.1 2023-11-19 19:00:37,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=767600.0, ans=0.125 2023-11-19 19:00:40,079 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 19:00:40,169 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 115150 2023-11-19 19:00:52,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=767666.6666666666, ans=0.0 2023-11-19 19:00:55,487 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 6950, loss[loss=0.07362, simple_loss=0.08913, pruned_loss=0.01601, audio_tagging_loss=0.01305, over 14708.00 frames. ], tot_loss[loss=0.08578, simple_loss=0.1055, pruned_loss=0.02276, audio_tagging_loss=0.01026, over 3037780.65 frames. ], batch size: 55, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 19:01:03,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=767733.3333333334, ans=0.125 2023-11-19 19:01:04,915 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2023-11-19 19:01:10,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=767800.0, ans=0.125 2023-11-19 19:01:15,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=767800.0, ans=0.2 2023-11-19 19:01:30,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=767866.6666666666, ans=0.0 2023-11-19 19:01:45,751 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 115200 2023-11-19 19:01:46,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=767933.3333333334, ans=0.125 2023-11-19 19:01:51,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=768000.0, ans=0.0 2023-11-19 19:01:52,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=768000.0, ans=0.1 2023-11-19 19:02:00,899 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 7000, loss[loss=0.06726, simple_loss=0.07544, pruned_loss=0.01532, audio_tagging_loss=0.01422, over 15665.00 frames. ], tot_loss[loss=0.08594, simple_loss=0.1055, pruned_loss=0.02285, audio_tagging_loss=0.01034, over 3040498.20 frames. ], batch size: 62, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:02:04,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=768066.6666666666, ans=0.04949747468305833 2023-11-19 19:02:12,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=768066.6666666666, ans=15.0 2023-11-19 19:02:13,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=768133.3333333334, ans=0.125 2023-11-19 19:02:26,306 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.25 vs. limit=15.0 2023-11-19 19:02:29,336 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.682e+01 8.339e+01 9.135e+01 1.019e+02 1.398e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-19 19:02:44,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=768266.6666666666, ans=0.125 2023-11-19 19:02:50,601 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 115250 2023-11-19 19:02:54,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=768333.3333333334, ans=0.125 2023-11-19 19:03:04,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=768333.3333333334, ans=0.0 2023-11-19 19:03:07,194 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 7050, loss[loss=0.08909, simple_loss=0.1061, pruned_loss=0.02442, audio_tagging_loss=0.01164, over 15433.00 frames. ], tot_loss[loss=0.08576, simple_loss=0.1051, pruned_loss=0.02278, audio_tagging_loss=0.01044, over 3039826.87 frames. ], batch size: 58, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:03:09,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=768400.0, ans=0.125 2023-11-19 19:03:17,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=768400.0, ans=0.125 2023-11-19 19:03:20,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=768466.6666666666, ans=0.0 2023-11-19 19:03:25,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=768466.6666666666, ans=0.1 2023-11-19 19:03:40,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=768533.3333333334, ans=0.5 2023-11-19 19:03:56,501 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 115300 2023-11-19 19:04:11,850 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 7100, loss[loss=0.07261, simple_loss=0.08319, pruned_loss=0.01953, audio_tagging_loss=0.01149, over 14769.00 frames. ], tot_loss[loss=0.08571, simple_loss=0.1047, pruned_loss=0.02277, audio_tagging_loss=0.01056, over 3042066.25 frames. ], batch size: 57, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:04:15,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=768733.3333333334, ans=0.125 2023-11-19 19:04:32,255 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.72 vs. limit=12.0 2023-11-19 19:04:38,360 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.390e+01 9.120e+01 9.831e+01 1.700e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 19:04:41,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=768866.6666666666, ans=0.07 2023-11-19 19:04:44,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=768866.6666666666, ans=0.125 2023-11-19 19:05:01,377 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 115350 2023-11-19 19:05:10,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=769000.0, ans=0.2 2023-11-19 19:05:16,473 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 7150, loss[loss=0.09195, simple_loss=0.1145, pruned_loss=0.02285, audio_tagging_loss=0.01185, over 15663.00 frames. ], tot_loss[loss=0.08599, simple_loss=0.1049, pruned_loss=0.02285, audio_tagging_loss=0.01071, over 3042043.11 frames. ], batch size: 58, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:05:51,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=769200.0, ans=0.0 2023-11-19 19:05:53,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=769200.0, ans=0.125 2023-11-19 19:06:06,698 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 115400 2023-11-19 19:06:10,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=769333.3333333334, ans=15.0 2023-11-19 19:06:15,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=769333.3333333334, ans=0.05 2023-11-19 19:06:23,074 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 7200, loss[loss=0.09362, simple_loss=0.1149, pruned_loss=0.02777, audio_tagging_loss=0.008399, over 15737.00 frames. ], tot_loss[loss=0.08531, simple_loss=0.1039, pruned_loss=0.02259, audio_tagging_loss=0.01078, over 3043823.14 frames. ], batch size: 58, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:06:26,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=769400.0, ans=0.125 2023-11-19 19:06:50,188 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.178e+01 8.420e+01 9.034e+01 9.720e+01 1.175e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-19 19:07:13,047 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 115450 2023-11-19 19:07:29,103 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 7250, loss[loss=0.08419, simple_loss=0.1001, pruned_loss=0.02199, audio_tagging_loss=0.01213, over 14865.00 frames. ], tot_loss[loss=0.08546, simple_loss=0.1041, pruned_loss=0.02262, audio_tagging_loss=0.0108, over 3045905.45 frames. ], batch size: 56, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:07:51,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.27 vs. limit=22.5 2023-11-19 19:08:06,027 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.37 vs. limit=15.0 2023-11-19 19:08:11,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=769933.3333333334, ans=0.2 2023-11-19 19:08:18,644 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 115500 2023-11-19 19:08:33,600 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 7300, loss[loss=0.0627, simple_loss=0.07758, pruned_loss=0.01525, audio_tagging_loss=0.008658, over 14692.00 frames. ], tot_loss[loss=0.08524, simple_loss=0.104, pruned_loss=0.0225, audio_tagging_loss=0.01074, over 3044348.48 frames. ], batch size: 57, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:08:36,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=770066.6666666666, ans=0.0 2023-11-19 19:08:36,994 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.47 vs. limit=15.0 2023-11-19 19:08:39,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=770066.6666666666, ans=0.125 2023-11-19 19:09:02,028 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.990e+01 8.445e+01 8.971e+01 9.866e+01 1.829e+02, threshold=1.794e+02, percent-clipped=1.0 2023-11-19 19:09:23,276 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 115550 2023-11-19 19:09:34,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=770333.3333333334, ans=0.1 2023-11-19 19:09:36,274 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.56 vs. limit=12.0 2023-11-19 19:09:37,901 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 7350, loss[loss=0.1042, simple_loss=0.1325, pruned_loss=0.02968, audio_tagging_loss=0.00831, over 15689.00 frames. ], tot_loss[loss=0.08476, simple_loss=0.1037, pruned_loss=0.02234, audio_tagging_loss=0.01056, over 3043693.87 frames. ], batch size: 56, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 19:09:40,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=770400.0, ans=0.1 2023-11-19 19:10:08,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=770533.3333333334, ans=0.125 2023-11-19 19:10:27,468 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 115600 2023-11-19 19:10:44,524 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 7400, loss[loss=0.1106, simple_loss=0.1322, pruned_loss=0.03731, audio_tagging_loss=0.007243, over 14401.00 frames. ], tot_loss[loss=0.08425, simple_loss=0.1032, pruned_loss=0.02223, audio_tagging_loss=0.01042, over 3042916.91 frames. ], batch size: 53, lr: 6.89e-03, grad_scale: 16.0 2023-11-19 19:10:44,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=770733.3333333334, ans=0.125 2023-11-19 19:10:49,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=770733.3333333334, ans=0.2 2023-11-19 19:10:50,327 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2023-11-19 19:10:52,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=770733.3333333334, ans=0.0 2023-11-19 19:10:58,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=770800.0, ans=0.2 2023-11-19 19:11:03,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=770800.0, ans=0.2 2023-11-19 19:11:11,833 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.500e+01 9.123e+01 1.022e+02 1.403e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 19:11:16,175 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.07 vs. limit=10.0 2023-11-19 19:11:18,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=770866.6666666666, ans=0.125 2023-11-19 19:11:19,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=770866.6666666666, ans=0.0 2023-11-19 19:11:22,597 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2023-11-19 19:11:25,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=770933.3333333334, ans=0.2 2023-11-19 19:11:29,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=770933.3333333334, ans=0.125 2023-11-19 19:11:34,143 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 115650 2023-11-19 19:11:35,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=771000.0, ans=0.125 2023-11-19 19:11:41,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=771000.0, ans=0.125 2023-11-19 19:11:44,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=771000.0, ans=0.2 2023-11-19 19:11:49,067 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 7450, loss[loss=0.09385, simple_loss=0.1163, pruned_loss=0.02679, audio_tagging_loss=0.00893, over 16203.00 frames. ], tot_loss[loss=0.0842, simple_loss=0.1032, pruned_loss=0.02223, audio_tagging_loss=0.01037, over 3035065.16 frames. ], batch size: 58, lr: 6.89e-03, grad_scale: 16.0 2023-11-19 19:12:08,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=771133.3333333334, ans=0.125 2023-11-19 19:12:29,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=771266.6666666666, ans=0.125 2023-11-19 19:12:37,759 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 115700 2023-11-19 19:12:44,775 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=15.0 2023-11-19 19:12:49,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=771333.3333333334, ans=0.0 2023-11-19 19:12:52,570 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 7500, loss[loss=0.07416, simple_loss=0.09517, pruned_loss=0.0151, audio_tagging_loss=0.01148, over 16175.00 frames. ], tot_loss[loss=0.08437, simple_loss=0.1036, pruned_loss=0.02231, audio_tagging_loss=0.01026, over 3044582.26 frames. ], batch size: 60, lr: 6.89e-03, grad_scale: 16.0 2023-11-19 19:12:56,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=771400.0, ans=0.125 2023-11-19 19:13:06,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=771466.6666666666, ans=0.125 2023-11-19 19:13:22,179 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.878e+01 8.310e+01 8.971e+01 9.844e+01 3.516e+02, threshold=1.794e+02, percent-clipped=1.0 2023-11-19 19:13:39,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=771600.0, ans=0.125 2023-11-19 19:13:41,879 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 115750 2023-11-19 19:13:59,053 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 7550, loss[loss=0.07401, simple_loss=0.08856, pruned_loss=0.01802, audio_tagging_loss=0.01171, over 15924.00 frames. ], tot_loss[loss=0.08401, simple_loss=0.1032, pruned_loss=0.02221, audio_tagging_loss=0.01018, over 3045806.75 frames. ], batch size: 60, lr: 6.89e-03, grad_scale: 16.0 2023-11-19 19:14:06,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=771733.3333333334, ans=0.125 2023-11-19 19:14:09,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=771733.3333333334, ans=10.0 2023-11-19 19:14:26,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=771866.6666666666, ans=0.0 2023-11-19 19:14:43,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=771933.3333333334, ans=0.1 2023-11-19 19:14:48,806 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 115800 2023-11-19 19:14:50,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=772000.0, ans=0.1 2023-11-19 19:14:51,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=772000.0, ans=0.1 2023-11-19 19:15:03,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=772066.6666666666, ans=0.125 2023-11-19 19:15:04,031 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 7600, loss[loss=0.08789, simple_loss=0.104, pruned_loss=0.02357, audio_tagging_loss=0.0123, over 14374.00 frames. ], tot_loss[loss=0.08521, simple_loss=0.1045, pruned_loss=0.02274, audio_tagging_loss=0.01021, over 3048670.32 frames. ], batch size: 55, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 19:15:32,979 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.786e+01 8.395e+01 8.967e+01 1.032e+02 1.336e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 19:15:35,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=772200.0, ans=0.125 2023-11-19 19:15:35,980 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2023-11-19 19:15:52,810 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2023-11-19 19:15:53,518 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 115850 2023-11-19 19:16:01,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=772333.3333333334, ans=0.04949747468305833 2023-11-19 19:16:03,864 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 19:16:08,674 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 7650, loss[loss=0.09598, simple_loss=0.1304, pruned_loss=0.02388, audio_tagging_loss=0.006891, over 16914.00 frames. ], tot_loss[loss=0.08502, simple_loss=0.1045, pruned_loss=0.02256, audio_tagging_loss=0.01019, over 3043572.28 frames. ], batch size: 61, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 19:16:15,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=772400.0, ans=0.125 2023-11-19 19:16:27,019 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.148e-03 2023-11-19 19:16:58,681 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 115900 2023-11-19 19:17:04,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=772666.6666666666, ans=0.125 2023-11-19 19:17:05,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=772666.6666666666, ans=0.1 2023-11-19 19:17:15,204 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 7700, loss[loss=0.07896, simple_loss=0.09717, pruned_loss=0.01637, audio_tagging_loss=0.01401, over 15820.00 frames. ], tot_loss[loss=0.08486, simple_loss=0.1044, pruned_loss=0.02247, audio_tagging_loss=0.01017, over 3041793.85 frames. ], batch size: 57, lr: 6.88e-03, grad_scale: 32.0 2023-11-19 19:17:43,108 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.843e+01 8.063e+01 8.419e+01 9.189e+01 1.330e+02, threshold=1.684e+02, percent-clipped=0.0 2023-11-19 19:17:44,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=772866.6666666666, ans=0.125 2023-11-19 19:18:04,570 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 115950 2023-11-19 19:18:07,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=773000.0, ans=0.125 2023-11-19 19:18:20,131 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 7750, loss[loss=0.06852, simple_loss=0.07823, pruned_loss=0.01808, audio_tagging_loss=0.01132, over 14246.00 frames. ], tot_loss[loss=0.085, simple_loss=0.1045, pruned_loss=0.02252, audio_tagging_loss=0.01022, over 3038533.62 frames. ], batch size: 56, lr: 6.88e-03, grad_scale: 32.0 2023-11-19 19:19:09,779 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 116000 2023-11-19 19:19:11,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=773333.3333333334, ans=0.125 2023-11-19 19:19:17,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=773333.3333333334, ans=0.125 2023-11-19 19:19:23,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=773333.3333333334, ans=0.125 2023-11-19 19:19:24,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=773333.3333333334, ans=0.125 2023-11-19 19:19:28,161 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 7800, loss[loss=0.07046, simple_loss=0.08932, pruned_loss=0.01736, audio_tagging_loss=0.008436, over 14210.00 frames. ], tot_loss[loss=0.08632, simple_loss=0.106, pruned_loss=0.02305, audio_tagging_loss=0.01028, over 3041897.04 frames. ], batch size: 54, lr: 6.88e-03, grad_scale: 32.0 2023-11-19 19:19:38,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=773400.0, ans=0.125 2023-11-19 19:19:58,894 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.729e+01 9.223e+01 9.822e+01 1.484e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 19:20:15,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=773600.0, ans=0.125 2023-11-19 19:20:17,761 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 116050 2023-11-19 19:20:24,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=773666.6666666666, ans=0.0 2023-11-19 19:20:28,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=773666.6666666666, ans=0.0 2023-11-19 19:20:34,234 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 7850, loss[loss=0.08371, simple_loss=0.1075, pruned_loss=0.02175, audio_tagging_loss=0.008239, over 15241.00 frames. ], tot_loss[loss=0.08629, simple_loss=0.1058, pruned_loss=0.02305, audio_tagging_loss=0.01037, over 3044912.73 frames. ], batch size: 56, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 19:20:43,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=773733.3333333334, ans=0.2 2023-11-19 19:21:18,096 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.34 vs. limit=15.0 2023-11-19 19:21:19,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=773933.3333333334, ans=0.125 2023-11-19 19:21:23,681 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 116100 2023-11-19 19:21:31,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=774000.0, ans=0.2 2023-11-19 19:21:33,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=774000.0, ans=0.0 2023-11-19 19:21:38,944 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 7900, loss[loss=0.08951, simple_loss=0.109, pruned_loss=0.02212, audio_tagging_loss=0.0129, over 15805.00 frames. ], tot_loss[loss=0.08664, simple_loss=0.1061, pruned_loss=0.0231, audio_tagging_loss=0.0105, over 3055964.67 frames. ], batch size: 59, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 19:21:40,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=774066.6666666666, ans=0.125 2023-11-19 19:21:43,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2023-11-19 19:21:44,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=774066.6666666666, ans=0.125 2023-11-19 19:21:44,539 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.29 vs. limit=22.5 2023-11-19 19:21:54,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=774133.3333333334, ans=0.1 2023-11-19 19:21:58,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=774133.3333333334, ans=0.1 2023-11-19 19:22:06,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.44 vs. limit=10.0 2023-11-19 19:22:08,444 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.972e+01 8.408e+01 9.159e+01 1.027e+02 1.292e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 19:22:15,993 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.96 vs. limit=15.0 2023-11-19 19:22:18,385 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.96 vs. limit=12.0 2023-11-19 19:22:27,821 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 116150 2023-11-19 19:22:38,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=774333.3333333334, ans=0.125 2023-11-19 19:22:43,210 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 7950, loss[loss=0.06498, simple_loss=0.07787, pruned_loss=0.01399, audio_tagging_loss=0.01205, over 14673.00 frames. ], tot_loss[loss=0.08674, simple_loss=0.1063, pruned_loss=0.02301, audio_tagging_loss=0.0106, over 3055870.43 frames. ], batch size: 57, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 19:22:57,983 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 19:23:07,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=774466.6666666666, ans=0.125 2023-11-19 19:23:27,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=774600.0, ans=0.125 2023-11-19 19:23:28,275 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=15.0 2023-11-19 19:23:32,855 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 116200 2023-11-19 19:23:32,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=774600.0, ans=0.125 2023-11-19 19:23:39,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=774666.6666666666, ans=0.0 2023-11-19 19:23:49,431 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 8000, loss[loss=0.07806, simple_loss=0.09171, pruned_loss=0.02075, audio_tagging_loss=0.01146, over 16319.00 frames. ], tot_loss[loss=0.08565, simple_loss=0.1045, pruned_loss=0.02269, audio_tagging_loss=0.01071, over 3052887.70 frames. ], batch size: 63, lr: 6.88e-03, grad_scale: 32.0 2023-11-19 19:23:49,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=774733.3333333334, ans=0.125 2023-11-19 19:24:05,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=774800.0, ans=0.0 2023-11-19 19:24:13,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=774800.0, ans=0.1 2023-11-19 19:24:17,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=774866.6666666666, ans=0.125 2023-11-19 19:24:19,088 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.333e+01 8.445e+01 9.319e+01 1.021e+02 1.426e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-19 19:24:28,296 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=15.0 2023-11-19 19:24:39,038 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 116250 2023-11-19 19:24:48,918 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2023-11-19 19:24:54,263 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 8050, loss[loss=0.1097, simple_loss=0.1372, pruned_loss=0.03022, audio_tagging_loss=0.01093, over 14471.00 frames. ], tot_loss[loss=0.0867, simple_loss=0.1058, pruned_loss=0.0231, audio_tagging_loss=0.0107, over 3059420.41 frames. ], batch size: 54, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:25:02,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=775066.6666666666, ans=0.0 2023-11-19 19:25:03,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=775066.6666666666, ans=0.0 2023-11-19 19:25:07,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=775133.3333333334, ans=0.1 2023-11-19 19:25:10,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=775133.3333333334, ans=0.0 2023-11-19 19:25:44,133 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 116300 2023-11-19 19:25:58,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=775400.0, ans=0.125 2023-11-19 19:25:59,363 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 8100, loss[loss=0.09987, simple_loss=0.1344, pruned_loss=0.02528, audio_tagging_loss=0.007379, over 16027.00 frames. ], tot_loss[loss=0.08611, simple_loss=0.1051, pruned_loss=0.02296, audio_tagging_loss=0.01061, over 3056575.72 frames. ], batch size: 56, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:26:00,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=775400.0, ans=0.125 2023-11-19 19:26:25,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=775533.3333333334, ans=15.0 2023-11-19 19:26:26,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=775533.3333333334, ans=0.125 2023-11-19 19:26:27,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=775533.3333333334, ans=0.1 2023-11-19 19:26:29,906 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.355e+01 8.902e+01 9.485e+01 1.238e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-19 19:26:35,582 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.08 vs. limit=12.0 2023-11-19 19:26:44,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=775600.0, ans=0.2 2023-11-19 19:26:49,870 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 116350 2023-11-19 19:26:51,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=775666.6666666666, ans=0.5 2023-11-19 19:26:56,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=775666.6666666666, ans=0.125 2023-11-19 19:27:00,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=775666.6666666666, ans=0.125 2023-11-19 19:27:05,435 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 8150, loss[loss=0.07569, simple_loss=0.09233, pruned_loss=0.0197, audio_tagging_loss=0.009822, over 15315.00 frames. ], tot_loss[loss=0.08622, simple_loss=0.1054, pruned_loss=0.02314, audio_tagging_loss=0.01038, over 3053418.84 frames. ], batch size: 59, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:27:15,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=775733.3333333334, ans=0.1 2023-11-19 19:27:22,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=775800.0, ans=0.125 2023-11-19 19:27:40,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=775866.6666666666, ans=0.125 2023-11-19 19:27:55,532 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 116400 2023-11-19 19:28:04,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=776000.0, ans=0.5 2023-11-19 19:28:10,289 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 19:28:11,487 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 8200, loss[loss=0.08288, simple_loss=0.1153, pruned_loss=0.01869, audio_tagging_loss=0.006563, over 15143.00 frames. ], tot_loss[loss=0.08629, simple_loss=0.1058, pruned_loss=0.02317, audio_tagging_loss=0.01021, over 3053614.37 frames. ], batch size: 57, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:28:12,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=776066.6666666666, ans=0.2 2023-11-19 19:28:20,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=776066.6666666666, ans=0.1 2023-11-19 19:28:33,955 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.66 vs. limit=15.0 2023-11-19 19:28:41,260 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.595e+01 9.342e+01 1.031e+02 1.321e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-19 19:28:53,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=776266.6666666666, ans=0.125 2023-11-19 19:28:57,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=776266.6666666666, ans=0.0 2023-11-19 19:29:01,313 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 116450 2023-11-19 19:29:09,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=776333.3333333334, ans=0.09899494936611666 2023-11-19 19:29:14,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=776333.3333333334, ans=0.125 2023-11-19 19:29:16,462 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 8250, loss[loss=0.07758, simple_loss=0.09806, pruned_loss=0.01716, audio_tagging_loss=0.01139, over 15542.00 frames. ], tot_loss[loss=0.08626, simple_loss=0.1057, pruned_loss=0.02317, audio_tagging_loss=0.01023, over 3052232.15 frames. ], batch size: 57, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:29:30,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=776466.6666666666, ans=0.0 2023-11-19 19:29:37,153 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.47 vs. limit=15.0 2023-11-19 19:29:45,781 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.68 vs. limit=12.0 2023-11-19 19:29:55,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=776600.0, ans=0.2 2023-11-19 19:30:00,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=776600.0, ans=0.2 2023-11-19 19:30:04,371 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.80 vs. limit=12.0 2023-11-19 19:30:06,002 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 116500 2023-11-19 19:30:13,389 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.90 vs. limit=22.5 2023-11-19 19:30:14,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=776666.6666666666, ans=0.1 2023-11-19 19:30:22,186 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 8300, loss[loss=0.07651, simple_loss=0.08532, pruned_loss=0.02193, audio_tagging_loss=0.01192, over 14592.00 frames. ], tot_loss[loss=0.08543, simple_loss=0.1045, pruned_loss=0.02287, audio_tagging_loss=0.01031, over 3049574.92 frames. ], batch size: 59, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:30:51,982 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.779e+01 8.250e+01 8.993e+01 9.595e+01 1.317e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-19 19:31:08,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.28 vs. limit=6.0 2023-11-19 19:31:11,737 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 116550 2023-11-19 19:31:15,918 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.54 vs. limit=15.0 2023-11-19 19:31:17,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=777000.0, ans=0.0 2023-11-19 19:31:25,590 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 19:31:27,868 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 8350, loss[loss=0.06267, simple_loss=0.08189, pruned_loss=0.01188, audio_tagging_loss=0.009841, over 15948.00 frames. ], tot_loss[loss=0.08511, simple_loss=0.1043, pruned_loss=0.02274, audio_tagging_loss=0.01021, over 3052111.34 frames. ], batch size: 60, lr: 6.86e-03, grad_scale: 32.0 2023-11-19 19:31:33,407 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=12.0 2023-11-19 19:31:47,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=777133.3333333334, ans=0.125 2023-11-19 19:31:57,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=777200.0, ans=0.125 2023-11-19 19:32:08,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=777266.6666666666, ans=0.125 2023-11-19 19:32:17,483 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 116600 2023-11-19 19:32:19,443 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.26 vs. limit=6.0 2023-11-19 19:32:32,824 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 8400, loss[loss=0.1045, simple_loss=0.1376, pruned_loss=0.02924, audio_tagging_loss=0.006505, over 15765.00 frames. ], tot_loss[loss=0.08451, simple_loss=0.1034, pruned_loss=0.02251, audio_tagging_loss=0.01027, over 3050839.70 frames. ], batch size: 55, lr: 6.86e-03, grad_scale: 32.0 2023-11-19 19:32:54,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.60 vs. limit=12.0 2023-11-19 19:33:00,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=777533.3333333334, ans=0.125 2023-11-19 19:33:03,756 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.528e+01 8.258e+01 9.022e+01 9.929e+01 1.314e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 19:33:04,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=777533.3333333334, ans=0.0 2023-11-19 19:33:05,585 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.67 vs. limit=22.5 2023-11-19 19:33:11,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=777600.0, ans=0.1 2023-11-19 19:33:12,975 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2023-11-19 19:33:18,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=777600.0, ans=0.125 2023-11-19 19:33:20,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=777600.0, ans=0.0 2023-11-19 19:33:22,299 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 116650 2023-11-19 19:33:23,119 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.23 vs. limit=15.0 2023-11-19 19:33:31,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=777666.6666666666, ans=12.0 2023-11-19 19:33:32,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=777666.6666666666, ans=0.125 2023-11-19 19:33:33,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=777666.6666666666, ans=0.125 2023-11-19 19:33:37,658 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 8450, loss[loss=0.1003, simple_loss=0.1265, pruned_loss=0.02812, audio_tagging_loss=0.008915, over 14799.00 frames. ], tot_loss[loss=0.08522, simple_loss=0.1043, pruned_loss=0.0227, audio_tagging_loss=0.01036, over 3046519.64 frames. ], batch size: 53, lr: 6.86e-03, grad_scale: 32.0 2023-11-19 19:34:04,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=777866.6666666666, ans=0.125 2023-11-19 19:34:27,536 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 116700 2023-11-19 19:34:32,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=778000.0, ans=0.1 2023-11-19 19:34:44,156 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 8500, loss[loss=0.08504, simple_loss=0.1137, pruned_loss=0.02071, audio_tagging_loss=0.00746, over 15688.00 frames. ], tot_loss[loss=0.08557, simple_loss=0.1051, pruned_loss=0.02273, audio_tagging_loss=0.01029, over 3053223.70 frames. ], batch size: 58, lr: 6.86e-03, grad_scale: 32.0 2023-11-19 19:34:54,958 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.97 vs. limit=15.0 2023-11-19 19:35:06,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=778133.3333333334, ans=0.125 2023-11-19 19:35:13,936 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.558e+01 8.293e+01 8.958e+01 9.904e+01 1.302e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-19 19:35:33,133 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 116750 2023-11-19 19:35:33,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2023-11-19 19:35:44,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=778333.3333333334, ans=0.1 2023-11-19 19:35:48,122 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 8550, loss[loss=0.0732, simple_loss=0.08242, pruned_loss=0.01975, audio_tagging_loss=0.01224, over 15464.00 frames. ], tot_loss[loss=0.08517, simple_loss=0.1045, pruned_loss=0.0226, audio_tagging_loss=0.01033, over 3056791.63 frames. ], batch size: 57, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 19:35:59,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=778466.6666666666, ans=0.2 2023-11-19 19:36:10,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=778466.6666666666, ans=0.125 2023-11-19 19:36:12,410 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2023-11-19 19:36:13,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=778533.3333333334, ans=0.2 2023-11-19 19:36:15,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=778533.3333333334, ans=0.125 2023-11-19 19:36:19,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=778533.3333333334, ans=0.125 2023-11-19 19:36:37,893 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 116800 2023-11-19 19:36:52,895 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 8600, loss[loss=0.1113, simple_loss=0.1423, pruned_loss=0.03149, audio_tagging_loss=0.00861, over 16104.00 frames. ], tot_loss[loss=0.08544, simple_loss=0.1047, pruned_loss=0.02265, audio_tagging_loss=0.01041, over 3056467.44 frames. ], batch size: 59, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 19:37:05,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=778800.0, ans=0.125 2023-11-19 19:37:11,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=778800.0, ans=0.0 2023-11-19 19:37:15,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=778800.0, ans=0.125 2023-11-19 19:37:24,759 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 8.439e+01 9.027e+01 1.024e+02 1.472e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 19:37:31,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=778933.3333333334, ans=0.0 2023-11-19 19:37:36,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=778933.3333333334, ans=0.125 2023-11-19 19:37:42,137 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 116850 2023-11-19 19:37:59,578 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 8650, loss[loss=0.06339, simple_loss=0.07377, pruned_loss=0.01699, audio_tagging_loss=0.00951, over 14887.00 frames. ], tot_loss[loss=0.08514, simple_loss=0.1045, pruned_loss=0.02242, audio_tagging_loss=0.01048, over 3053183.10 frames. ], batch size: 57, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 19:38:02,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.42 vs. limit=15.0 2023-11-19 19:38:13,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=779133.3333333334, ans=0.0 2023-11-19 19:38:27,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=779200.0, ans=0.0 2023-11-19 19:38:39,516 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.29 vs. limit=22.5 2023-11-19 19:38:48,811 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 116900 2023-11-19 19:38:53,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=779333.3333333334, ans=0.09899494936611666 2023-11-19 19:38:56,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=779333.3333333334, ans=0.125 2023-11-19 19:38:56,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=779333.3333333334, ans=0.0 2023-11-19 19:39:03,568 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 8700, loss[loss=0.1109, simple_loss=0.1352, pruned_loss=0.03213, audio_tagging_loss=0.01118, over 15208.00 frames. ], tot_loss[loss=0.08545, simple_loss=0.1045, pruned_loss=0.02267, audio_tagging_loss=0.01052, over 3054178.60 frames. ], batch size: 58, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 19:39:03,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=779400.0, ans=0.125 2023-11-19 19:39:12,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=779400.0, ans=0.125 2023-11-19 19:39:35,922 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.849e+01 8.405e+01 9.122e+01 9.859e+01 1.308e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 19:39:46,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=779600.0, ans=0.0 2023-11-19 19:39:53,701 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 116950 2023-11-19 19:39:57,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=779666.6666666666, ans=0.125 2023-11-19 19:39:59,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=779666.6666666666, ans=0.2 2023-11-19 19:40:01,845 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2023-11-19 19:40:08,882 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 8750, loss[loss=0.09978, simple_loss=0.1156, pruned_loss=0.03092, audio_tagging_loss=0.01108, over 14728.00 frames. ], tot_loss[loss=0.08542, simple_loss=0.1044, pruned_loss=0.02262, audio_tagging_loss=0.0106, over 3057056.58 frames. ], batch size: 55, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 19:40:14,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=779733.3333333334, ans=0.125 2023-11-19 19:40:41,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=779866.6666666666, ans=0.125 2023-11-19 19:40:53,017 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.79 vs. limit=22.5 2023-11-19 19:40:53,085 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.33 vs. limit=12.0 2023-11-19 19:40:53,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=779933.3333333334, ans=0.1 2023-11-19 19:40:56,701 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.23 vs. limit=22.5 2023-11-19 19:40:58,646 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 117000 2023-11-19 19:41:13,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=780066.6666666666, ans=0.1 2023-11-19 19:41:15,667 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 8800, loss[loss=0.1081, simple_loss=0.1306, pruned_loss=0.03039, audio_tagging_loss=0.01244, over 15346.00 frames. ], tot_loss[loss=0.08617, simple_loss=0.1055, pruned_loss=0.02269, audio_tagging_loss=0.01073, over 3051616.56 frames. ], batch size: 57, lr: 6.85e-03, grad_scale: 32.0 2023-11-19 19:41:22,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=780066.6666666666, ans=0.0 2023-11-19 19:41:41,949 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.93 vs. limit=10.0 2023-11-19 19:41:46,220 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 8.517e+01 9.316e+01 1.005e+02 1.428e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-19 19:42:02,881 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.26 vs. limit=10.0 2023-11-19 19:42:05,453 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 117050 2023-11-19 19:42:20,933 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 8850, loss[loss=0.06943, simple_loss=0.08409, pruned_loss=0.01619, audio_tagging_loss=0.01119, over 15200.00 frames. ], tot_loss[loss=0.08611, simple_loss=0.1055, pruned_loss=0.02264, audio_tagging_loss=0.0107, over 3052330.84 frames. ], batch size: 58, lr: 6.85e-03, grad_scale: 32.0 2023-11-19 19:42:25,528 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.74 vs. limit=22.5 2023-11-19 19:42:26,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=780400.0, ans=0.0 2023-11-19 19:42:31,028 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 19:42:38,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=780466.6666666666, ans=0.2 2023-11-19 19:42:40,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=780466.6666666666, ans=0.2 2023-11-19 19:43:06,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=780600.0, ans=0.1 2023-11-19 19:43:10,888 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 117100 2023-11-19 19:43:10,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=780600.0, ans=0.125 2023-11-19 19:43:17,481 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.60 vs. limit=10.0 2023-11-19 19:43:25,654 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 8900, loss[loss=0.07958, simple_loss=0.08187, pruned_loss=0.01681, audio_tagging_loss=0.02184, over 15157.00 frames. ], tot_loss[loss=0.08594, simple_loss=0.1057, pruned_loss=0.02257, audio_tagging_loss=0.01053, over 3057807.25 frames. ], batch size: 58, lr: 6.85e-03, grad_scale: 32.0 2023-11-19 19:43:32,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=780733.3333333334, ans=0.0 2023-11-19 19:43:57,706 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 8.524e+01 9.164e+01 1.008e+02 2.519e+02, threshold=1.833e+02, percent-clipped=1.0 2023-11-19 19:44:11,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=780933.3333333334, ans=0.125 2023-11-19 19:44:15,128 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 117150 2023-11-19 19:44:30,942 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 8950, loss[loss=0.08694, simple_loss=0.1149, pruned_loss=0.02405, audio_tagging_loss=0.005443, over 15468.00 frames. ], tot_loss[loss=0.08639, simple_loss=0.1066, pruned_loss=0.02282, audio_tagging_loss=0.01027, over 3053066.44 frames. ], batch size: 55, lr: 6.85e-03, grad_scale: 32.0 2023-11-19 19:44:35,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=781066.6666666666, ans=0.07 2023-11-19 19:44:46,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=781133.3333333334, ans=0.0 2023-11-19 19:45:13,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=781266.6666666666, ans=0.2 2023-11-19 19:45:20,425 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 117200 2023-11-19 19:45:36,442 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 9000, loss[loss=0.07506, simple_loss=0.09144, pruned_loss=0.01614, audio_tagging_loss=0.0132, over 15326.00 frames. ], tot_loss[loss=0.0871, simple_loss=0.1074, pruned_loss=0.02321, audio_tagging_loss=0.01019, over 3055956.84 frames. ], batch size: 58, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 19:45:36,443 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-19 19:46:18,837 INFO [train_asr.py:1294] (3/4) Epoch 10, validation: loss=0.06518, simple_loss=0.05524, pruned_loss=0.006372, audio_tagging_loss=0.03119, over 4681554.00 frames. 2023-11-19 19:46:18,838 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-19 19:46:47,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=781533.3333333334, ans=0.125 2023-11-19 19:46:51,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=781533.3333333334, ans=0.0 2023-11-19 19:46:52,443 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.536e+01 8.923e+01 9.790e+01 1.498e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-19 19:46:56,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=781533.3333333334, ans=0.125 2023-11-19 19:46:59,404 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2023-11-19 19:47:08,914 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 117250 2023-11-19 19:47:13,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=781666.6666666666, ans=0.0 2023-11-19 19:47:19,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=781666.6666666666, ans=0.0 2023-11-19 19:47:25,453 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 9050, loss[loss=0.06237, simple_loss=0.07308, pruned_loss=0.01421, audio_tagging_loss=0.01162, over 15335.00 frames. ], tot_loss[loss=0.08623, simple_loss=0.1062, pruned_loss=0.02292, audio_tagging_loss=0.01023, over 3052990.64 frames. ], batch size: 58, lr: 6.84e-03, grad_scale: 16.0 2023-11-19 19:47:39,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.72 vs. limit=22.5 2023-11-19 19:48:10,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=781933.3333333334, ans=0.125 2023-11-19 19:48:15,156 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 117300 2023-11-19 19:48:22,858 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 19:48:30,605 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 9100, loss[loss=0.09441, simple_loss=0.1165, pruned_loss=0.02737, audio_tagging_loss=0.008799, over 15103.00 frames. ], tot_loss[loss=0.08608, simple_loss=0.1062, pruned_loss=0.02285, audio_tagging_loss=0.01015, over 3053348.84 frames. ], batch size: 55, lr: 6.84e-03, grad_scale: 16.0 2023-11-19 19:48:35,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=782066.6666666666, ans=0.0 2023-11-19 19:48:40,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=782066.6666666666, ans=0.1 2023-11-19 19:48:42,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=782133.3333333334, ans=0.0 2023-11-19 19:49:03,198 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.816e+01 8.111e+01 8.734e+01 9.488e+01 1.224e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-19 19:49:21,042 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 117350 2023-11-19 19:49:31,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=782333.3333333334, ans=0.1 2023-11-19 19:49:36,028 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 9150, loss[loss=0.09927, simple_loss=0.1205, pruned_loss=0.02778, audio_tagging_loss=0.01126, over 14825.00 frames. ], tot_loss[loss=0.08555, simple_loss=0.1055, pruned_loss=0.02267, audio_tagging_loss=0.01015, over 3053995.25 frames. ], batch size: 55, lr: 6.84e-03, grad_scale: 16.0 2023-11-19 19:49:47,080 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.08 vs. limit=6.0 2023-11-19 19:49:53,859 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.22 vs. limit=22.5 2023-11-19 19:50:11,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=782533.3333333334, ans=0.1 2023-11-19 19:50:13,686 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 19:50:21,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=782600.0, ans=0.125 2023-11-19 19:50:26,100 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 117400 2023-11-19 19:50:42,579 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 9200, loss[loss=0.08474, simple_loss=0.1139, pruned_loss=0.0203, audio_tagging_loss=0.007472, over 15565.00 frames. ], tot_loss[loss=0.08522, simple_loss=0.1048, pruned_loss=0.02267, audio_tagging_loss=0.01013, over 3052444.37 frames. ], batch size: 57, lr: 6.84e-03, grad_scale: 32.0 2023-11-19 19:50:47,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=782733.3333333334, ans=0.0 2023-11-19 19:51:09,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=782866.6666666666, ans=0.125 2023-11-19 19:51:15,188 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.014e+01 8.317e+01 9.062e+01 1.049e+02 1.537e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-19 19:51:33,117 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 117450 2023-11-19 19:51:34,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=783000.0, ans=0.125 2023-11-19 19:51:36,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=783000.0, ans=0.07 2023-11-19 19:51:42,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=783000.0, ans=0.125 2023-11-19 19:51:48,845 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 9250, loss[loss=0.06376, simple_loss=0.07584, pruned_loss=0.01495, audio_tagging_loss=0.0109, over 15733.00 frames. ], tot_loss[loss=0.08448, simple_loss=0.1039, pruned_loss=0.02241, audio_tagging_loss=0.01012, over 3056720.51 frames. ], batch size: 61, lr: 6.84e-03, grad_scale: 32.0 2023-11-19 19:51:49,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=783066.6666666666, ans=0.125 2023-11-19 19:51:52,164 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.47 vs. limit=12.0 2023-11-19 19:51:59,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=783066.6666666666, ans=0.09899494936611666 2023-11-19 19:51:59,797 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=22.5 2023-11-19 19:52:07,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=783133.3333333334, ans=0.1 2023-11-19 19:52:23,963 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2023-11-19 19:52:38,856 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 117500 2023-11-19 19:52:41,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=783333.3333333334, ans=0.04949747468305833 2023-11-19 19:52:54,384 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 9300, loss[loss=0.1093, simple_loss=0.1323, pruned_loss=0.02887, audio_tagging_loss=0.01429, over 14928.00 frames. ], tot_loss[loss=0.08468, simple_loss=0.1038, pruned_loss=0.02252, audio_tagging_loss=0.01024, over 3052723.07 frames. ], batch size: 52, lr: 6.84e-03, grad_scale: 32.0 2023-11-19 19:53:05,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=783400.0, ans=0.04949747468305833 2023-11-19 19:53:26,988 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.639e+01 8.312e+01 9.036e+01 9.787e+01 1.162e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-19 19:53:30,059 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.32 vs. limit=15.0 2023-11-19 19:53:31,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=783533.3333333334, ans=0.0 2023-11-19 19:53:44,131 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 117550 2023-11-19 19:53:59,579 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 9350, loss[loss=0.07228, simple_loss=0.09068, pruned_loss=0.01589, audio_tagging_loss=0.01105, over 15722.00 frames. ], tot_loss[loss=0.08565, simple_loss=0.1052, pruned_loss=0.02289, audio_tagging_loss=0.01015, over 3056140.87 frames. ], batch size: 58, lr: 6.84e-03, grad_scale: 32.0 2023-11-19 19:54:01,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=783733.3333333334, ans=0.0 2023-11-19 19:54:21,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=783800.0, ans=0.025 2023-11-19 19:54:49,358 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 117600 2023-11-19 19:55:01,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=784000.0, ans=0.0 2023-11-19 19:55:05,784 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 9400, loss[loss=0.1107, simple_loss=0.1363, pruned_loss=0.03434, audio_tagging_loss=0.008162, over 16489.00 frames. ], tot_loss[loss=0.08638, simple_loss=0.1059, pruned_loss=0.02314, audio_tagging_loss=0.01031, over 3063009.58 frames. ], batch size: 61, lr: 6.83e-03, grad_scale: 16.0 2023-11-19 19:55:21,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=784133.3333333334, ans=0.2 2023-11-19 19:55:31,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=784200.0, ans=0.125 2023-11-19 19:55:39,192 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.476e+01 9.081e+01 1.030e+02 1.355e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-19 19:55:49,642 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.30 vs. limit=22.5 2023-11-19 19:55:50,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=784266.6666666666, ans=0.125 2023-11-19 19:55:52,385 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.48 vs. limit=22.5 2023-11-19 19:55:55,255 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 117650 2023-11-19 19:56:06,418 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 19:56:11,011 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 9450, loss[loss=0.07788, simple_loss=0.09958, pruned_loss=0.01763, audio_tagging_loss=0.01045, over 15816.00 frames. ], tot_loss[loss=0.08723, simple_loss=0.1072, pruned_loss=0.02335, audio_tagging_loss=0.01029, over 3066273.80 frames. ], batch size: 60, lr: 6.83e-03, grad_scale: 16.0 2023-11-19 19:56:16,630 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2023-11-19 19:56:21,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=784400.0, ans=0.0 2023-11-19 19:56:22,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=784466.6666666666, ans=0.125 2023-11-19 19:56:39,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=784533.3333333334, ans=0.125 2023-11-19 19:56:41,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=784533.3333333334, ans=0.125 2023-11-19 19:56:46,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=784533.3333333334, ans=0.125 2023-11-19 19:57:00,673 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 117700 2023-11-19 19:57:01,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=784666.6666666666, ans=0.125 2023-11-19 19:57:03,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=784666.6666666666, ans=0.125 2023-11-19 19:57:13,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=784666.6666666666, ans=0.1 2023-11-19 19:57:16,055 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 9500, loss[loss=0.09511, simple_loss=0.1047, pruned_loss=0.0302, audio_tagging_loss=0.01258, over 14618.00 frames. ], tot_loss[loss=0.08738, simple_loss=0.1071, pruned_loss=0.0235, audio_tagging_loss=0.01036, over 3058605.37 frames. ], batch size: 57, lr: 6.83e-03, grad_scale: 16.0 2023-11-19 19:57:18,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=784733.3333333334, ans=0.07 2023-11-19 19:57:20,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=784733.3333333334, ans=0.035 2023-11-19 19:57:26,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=784733.3333333334, ans=0.125 2023-11-19 19:57:31,587 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.56 vs. limit=15.0 2023-11-19 19:57:45,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=784866.6666666666, ans=0.125 2023-11-19 19:57:49,740 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.346e+01 9.084e+01 9.988e+01 1.421e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 19:58:06,183 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 117750 2023-11-19 19:58:21,956 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 9550, loss[loss=0.07681, simple_loss=0.09248, pruned_loss=0.02012, audio_tagging_loss=0.01045, over 15921.00 frames. ], tot_loss[loss=0.08805, simple_loss=0.1078, pruned_loss=0.02376, audio_tagging_loss=0.01042, over 3056849.77 frames. ], batch size: 62, lr: 6.83e-03, grad_scale: 16.0 2023-11-19 19:58:22,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=785066.6666666666, ans=0.125 2023-11-19 19:58:42,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=785133.3333333334, ans=0.1 2023-11-19 19:58:55,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=785200.0, ans=0.125 2023-11-19 19:59:09,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=785266.6666666666, ans=0.0 2023-11-19 19:59:11,484 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 117800 2023-11-19 19:59:26,838 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 9600, loss[loss=0.07325, simple_loss=0.08687, pruned_loss=0.01806, audio_tagging_loss=0.01175, over 17300.00 frames. ], tot_loss[loss=0.08794, simple_loss=0.1075, pruned_loss=0.0237, audio_tagging_loss=0.01047, over 3058791.53 frames. ], batch size: 65, lr: 6.83e-03, grad_scale: 32.0 2023-11-19 20:00:00,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=785533.3333333334, ans=0.0 2023-11-19 20:00:00,956 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2023-11-19 20:00:01,486 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.979e+01 8.524e+01 9.148e+01 9.988e+01 1.418e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 20:00:05,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=785600.0, ans=0.0 2023-11-19 20:00:15,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=785600.0, ans=0.0 2023-11-19 20:00:16,399 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 117850 2023-11-19 20:00:22,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=785666.6666666666, ans=0.0 2023-11-19 20:00:32,104 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 9650, loss[loss=0.08335, simple_loss=0.1021, pruned_loss=0.02205, audio_tagging_loss=0.01023, over 15097.00 frames. ], tot_loss[loss=0.08652, simple_loss=0.1059, pruned_loss=0.02313, audio_tagging_loss=0.01045, over 3050609.09 frames. ], batch size: 57, lr: 6.83e-03, grad_scale: 32.0 2023-11-19 20:00:42,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=785733.3333333334, ans=0.125 2023-11-19 20:01:17,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=785933.3333333334, ans=0.125 2023-11-19 20:01:22,108 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 117900 2023-11-19 20:01:28,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=786000.0, ans=0.1 2023-11-19 20:01:37,763 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 9700, loss[loss=0.08677, simple_loss=0.1232, pruned_loss=0.01844, audio_tagging_loss=0.006728, over 15855.00 frames. ], tot_loss[loss=0.0866, simple_loss=0.1061, pruned_loss=0.02324, audio_tagging_loss=0.01031, over 3041014.80 frames. ], batch size: 57, lr: 6.83e-03, grad_scale: 32.0 2023-11-19 20:01:44,651 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.20 vs. limit=15.0 2023-11-19 20:01:57,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=786133.3333333334, ans=0.09899494936611666 2023-11-19 20:02:06,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=786200.0, ans=0.125 2023-11-19 20:02:11,051 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.874e+01 8.357e+01 9.241e+01 1.009e+02 1.482e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 20:02:23,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=786266.6666666666, ans=0.125 2023-11-19 20:02:26,101 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2023-11-19 20:02:26,868 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 117950 2023-11-19 20:02:41,771 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 9750, loss[loss=0.08914, simple_loss=0.1117, pruned_loss=0.02228, audio_tagging_loss=0.01099, over 15053.00 frames. ], tot_loss[loss=0.08642, simple_loss=0.1062, pruned_loss=0.02319, audio_tagging_loss=0.01012, over 3044025.17 frames. ], batch size: 54, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:03:05,817 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-11-19 20:03:06,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=786533.3333333334, ans=0.0 2023-11-19 20:03:15,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=786533.3333333334, ans=0.0 2023-11-19 20:03:23,499 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:03:30,744 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 118000 2023-11-19 20:03:32,501 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.31 vs. limit=10.0 2023-11-19 20:03:38,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=786666.6666666666, ans=0.125 2023-11-19 20:03:46,757 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 9800, loss[loss=0.1153, simple_loss=0.1484, pruned_loss=0.034, audio_tagging_loss=0.007132, over 14966.00 frames. ], tot_loss[loss=0.08687, simple_loss=0.1068, pruned_loss=0.02342, audio_tagging_loss=0.01003, over 3036007.93 frames. ], batch size: 52, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:03:48,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=786733.3333333334, ans=0.125 2023-11-19 20:04:08,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=786800.0, ans=0.125 2023-11-19 20:04:14,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=786866.6666666666, ans=0.05 2023-11-19 20:04:20,511 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.585e+01 9.415e+01 1.041e+02 1.362e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-19 20:04:22,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=786866.6666666666, ans=0.125 2023-11-19 20:04:35,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.80 vs. limit=10.0 2023-11-19 20:04:36,294 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 118050 2023-11-19 20:04:43,127 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:04:43,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=787000.0, ans=0.0 2023-11-19 20:04:45,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=787000.0, ans=0.07 2023-11-19 20:04:52,885 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 9850, loss[loss=0.08318, simple_loss=0.1046, pruned_loss=0.02012, audio_tagging_loss=0.01074, over 15957.00 frames. ], tot_loss[loss=0.08665, simple_loss=0.1065, pruned_loss=0.02333, audio_tagging_loss=0.01006, over 3034818.79 frames. ], batch size: 61, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:04:57,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=787066.6666666666, ans=0.0 2023-11-19 20:05:05,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=787133.3333333334, ans=0.1 2023-11-19 20:05:15,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=787133.3333333334, ans=0.125 2023-11-19 20:05:28,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=787200.0, ans=0.125 2023-11-19 20:05:40,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=787266.6666666666, ans=0.125 2023-11-19 20:05:41,639 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 118100 2023-11-19 20:05:42,423 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.20 vs. limit=15.0 2023-11-19 20:05:50,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=787333.3333333334, ans=0.2 2023-11-19 20:05:56,633 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 9900, loss[loss=0.07956, simple_loss=0.09654, pruned_loss=0.02245, audio_tagging_loss=0.008841, over 15275.00 frames. ], tot_loss[loss=0.08699, simple_loss=0.1069, pruned_loss=0.02344, audio_tagging_loss=0.01008, over 3032649.58 frames. ], batch size: 59, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:06:03,351 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:06:07,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=787400.0, ans=0.125 2023-11-19 20:06:09,892 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.44 vs. limit=22.5 2023-11-19 20:06:13,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=787466.6666666666, ans=0.0 2023-11-19 20:06:26,704 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.83 vs. limit=22.5 2023-11-19 20:06:30,691 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.320e+01 8.268e+01 9.019e+01 9.736e+01 1.319e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 20:06:45,640 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 118150 2023-11-19 20:06:47,607 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2023-11-19 20:07:00,297 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 9950, loss[loss=0.07978, simple_loss=0.09905, pruned_loss=0.02056, audio_tagging_loss=0.009691, over 16227.00 frames. ], tot_loss[loss=0.08646, simple_loss=0.1063, pruned_loss=0.02323, audio_tagging_loss=0.0101, over 3029126.20 frames. ], batch size: 62, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:07:22,965 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:07:25,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=787866.6666666666, ans=0.125 2023-11-19 20:07:48,747 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 118200 2023-11-19 20:07:53,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=788000.0, ans=0.125 2023-11-19 20:07:54,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=788000.0, ans=0.07 2023-11-19 20:08:05,801 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 10000, loss[loss=0.09785, simple_loss=0.1135, pruned_loss=0.03028, audio_tagging_loss=0.01081, over 16240.00 frames. ], tot_loss[loss=0.08544, simple_loss=0.1047, pruned_loss=0.02285, audio_tagging_loss=0.01023, over 3032952.88 frames. ], batch size: 61, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:08:14,936 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.03 vs. limit=15.0 2023-11-19 20:08:18,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=788133.3333333334, ans=0.0 2023-11-19 20:08:24,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=788133.3333333334, ans=0.125 2023-11-19 20:08:30,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=788200.0, ans=0.025 2023-11-19 20:08:36,348 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.40 vs. limit=15.0 2023-11-19 20:08:37,911 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.424e+01 7.892e+01 8.582e+01 9.307e+01 3.708e+02, threshold=1.716e+02, percent-clipped=1.0 2023-11-19 20:08:45,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=788266.6666666666, ans=0.1 2023-11-19 20:08:54,217 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 118250 2023-11-19 20:08:56,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=788333.3333333334, ans=0.125 2023-11-19 20:09:06,525 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2023-11-19 20:09:09,683 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 10050, loss[loss=0.07282, simple_loss=0.08548, pruned_loss=0.01703, audio_tagging_loss=0.01305, over 15577.00 frames. ], tot_loss[loss=0.08517, simple_loss=0.1045, pruned_loss=0.02264, audio_tagging_loss=0.01029, over 3031842.32 frames. ], batch size: 60, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:09:29,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=788466.6666666666, ans=0.125 2023-11-19 20:09:42,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=788533.3333333334, ans=0.0 2023-11-19 20:09:58,744 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 118300 2023-11-19 20:10:10,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=788666.6666666666, ans=0.125 2023-11-19 20:10:13,514 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 10100, loss[loss=0.09799, simple_loss=0.1176, pruned_loss=0.03017, audio_tagging_loss=0.009004, over 16399.00 frames. ], tot_loss[loss=0.08512, simple_loss=0.1043, pruned_loss=0.02266, audio_tagging_loss=0.0103, over 3042587.67 frames. ], batch size: 61, lr: 6.81e-03, grad_scale: 32.0 2023-11-19 20:10:20,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=788733.3333333334, ans=0.125 2023-11-19 20:10:23,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2023-11-19 20:10:37,992 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.794e-01 2023-11-19 20:10:47,500 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.462e+01 9.399e+01 1.049e+02 1.408e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-19 20:11:02,488 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 118350 2023-11-19 20:11:03,605 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:11:18,274 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 10150, loss[loss=0.08172, simple_loss=0.1003, pruned_loss=0.02151, audio_tagging_loss=0.01008, over 14975.00 frames. ], tot_loss[loss=0.08577, simple_loss=0.1049, pruned_loss=0.02288, audio_tagging_loss=0.01045, over 3040492.69 frames. ], batch size: 57, lr: 6.81e-03, grad_scale: 32.0 2023-11-19 20:11:20,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=789066.6666666666, ans=0.125 2023-11-19 20:11:37,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=789133.3333333334, ans=0.2 2023-11-19 20:11:46,497 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:12:03,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=789266.6666666666, ans=0.125 2023-11-19 20:12:07,317 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 118400 2023-11-19 20:12:07,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=789266.6666666666, ans=0.125 2023-11-19 20:12:23,170 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 10200, loss[loss=0.09572, simple_loss=0.1191, pruned_loss=0.02534, audio_tagging_loss=0.01085, over 15533.00 frames. ], tot_loss[loss=0.08538, simple_loss=0.1045, pruned_loss=0.02262, audio_tagging_loss=0.01051, over 3043159.18 frames. ], batch size: 58, lr: 6.81e-03, grad_scale: 32.0 2023-11-19 20:12:26,191 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:12:44,374 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:12:57,881 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.646e+01 8.394e+01 8.968e+01 9.888e+01 1.322e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-19 20:13:12,272 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 118450 2023-11-19 20:13:22,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=789666.6666666666, ans=0.1 2023-11-19 20:13:27,003 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 10250, loss[loss=0.08974, simple_loss=0.1119, pruned_loss=0.02451, audio_tagging_loss=0.009286, over 15392.00 frames. ], tot_loss[loss=0.08618, simple_loss=0.1055, pruned_loss=0.02295, audio_tagging_loss=0.01048, over 3036872.83 frames. ], batch size: 58, lr: 6.81e-03, grad_scale: 16.0 2023-11-19 20:13:28,862 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=15.0 2023-11-19 20:13:30,141 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.04 vs. limit=10.0 2023-11-19 20:13:34,255 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2023-11-19 20:13:41,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=789800.0, ans=0.0 2023-11-19 20:13:49,461 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=22.5 2023-11-19 20:14:11,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=789933.3333333334, ans=0.2 2023-11-19 20:14:16,514 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 118500 2023-11-19 20:14:31,739 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 10300, loss[loss=0.09967, simple_loss=0.1259, pruned_loss=0.02759, audio_tagging_loss=0.009115, over 15726.00 frames. ], tot_loss[loss=0.08559, simple_loss=0.1047, pruned_loss=0.0227, audio_tagging_loss=0.01055, over 3040376.98 frames. ], batch size: 58, lr: 6.81e-03, grad_scale: 16.0 2023-11-19 20:14:39,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=790066.6666666666, ans=0.05 2023-11-19 20:14:53,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=790133.3333333334, ans=0.125 2023-11-19 20:14:56,345 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.27 vs. limit=15.0 2023-11-19 20:15:06,826 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.556e+01 8.163e+01 8.831e+01 9.880e+01 1.200e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-19 20:15:21,128 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 118550 2023-11-19 20:15:37,164 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 10350, loss[loss=0.105, simple_loss=0.1367, pruned_loss=0.02663, audio_tagging_loss=0.009993, over 15960.00 frames. ], tot_loss[loss=0.08562, simple_loss=0.1047, pruned_loss=0.02269, audio_tagging_loss=0.01058, over 3049767.74 frames. ], batch size: 55, lr: 6.81e-03, grad_scale: 16.0 2023-11-19 20:15:54,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=790466.6666666666, ans=0.2 2023-11-19 20:16:07,272 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.75 vs. limit=8.0 2023-11-19 20:16:25,926 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 118600 2023-11-19 20:16:33,877 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.86 vs. limit=22.5 2023-11-19 20:16:41,675 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 10400, loss[loss=0.08295, simple_loss=0.09164, pruned_loss=0.02387, audio_tagging_loss=0.01327, over 15322.00 frames. ], tot_loss[loss=0.08596, simple_loss=0.1052, pruned_loss=0.02271, audio_tagging_loss=0.01067, over 3044685.22 frames. ], batch size: 60, lr: 6.81e-03, grad_scale: 32.0 2023-11-19 20:16:45,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=790733.3333333334, ans=0.0 2023-11-19 20:16:53,789 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:16:55,514 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.71 vs. limit=15.0 2023-11-19 20:17:06,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=790800.0, ans=0.09899494936611666 2023-11-19 20:17:08,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=790866.6666666666, ans=0.0 2023-11-19 20:17:17,244 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.085e+01 8.510e+01 9.248e+01 1.057e+02 2.087e+02, threshold=1.850e+02, percent-clipped=1.0 2023-11-19 20:17:29,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=790933.3333333334, ans=0.125 2023-11-19 20:17:31,916 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 118650 2023-11-19 20:17:43,153 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.95 vs. limit=15.0 2023-11-19 20:17:47,282 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 10450, loss[loss=0.06161, simple_loss=0.07179, pruned_loss=0.01646, audio_tagging_loss=0.009249, over 15041.00 frames. ], tot_loss[loss=0.08526, simple_loss=0.1044, pruned_loss=0.02244, audio_tagging_loss=0.01062, over 3048714.09 frames. ], batch size: 59, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:17:56,528 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=15.0 2023-11-19 20:18:22,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=791200.0, ans=0.125 2023-11-19 20:18:36,688 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 118700 2023-11-19 20:18:47,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=791333.3333333334, ans=0.125 2023-11-19 20:18:50,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=791333.3333333334, ans=0.1 2023-11-19 20:18:52,556 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 10500, loss[loss=0.07951, simple_loss=0.09794, pruned_loss=0.02017, audio_tagging_loss=0.01037, over 16003.00 frames. ], tot_loss[loss=0.08462, simple_loss=0.1038, pruned_loss=0.02224, audio_tagging_loss=0.01048, over 3042596.01 frames. ], batch size: 58, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:18:56,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=791400.0, ans=0.2 2023-11-19 20:18:56,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=791400.0, ans=0.0 2023-11-19 20:19:01,896 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2023-11-19 20:19:28,546 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.886e+01 8.049e+01 8.549e+01 9.577e+01 1.136e+02, threshold=1.710e+02, percent-clipped=0.0 2023-11-19 20:19:30,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=791600.0, ans=0.125 2023-11-19 20:19:35,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=791600.0, ans=0.125 2023-11-19 20:19:42,024 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 118750 2023-11-19 20:19:56,802 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 10550, loss[loss=0.07243, simple_loss=0.08735, pruned_loss=0.01596, audio_tagging_loss=0.0128, over 16729.00 frames. ], tot_loss[loss=0.08463, simple_loss=0.1041, pruned_loss=0.02221, audio_tagging_loss=0.01035, over 3048468.83 frames. ], batch size: 62, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:20:19,603 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2023-11-19 20:20:22,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=791866.6666666666, ans=0.125 2023-11-19 20:20:23,197 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.88 vs. limit=15.0 2023-11-19 20:20:45,998 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.86 vs. limit=12.0 2023-11-19 20:20:46,384 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 118800 2023-11-19 20:20:47,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=792000.0, ans=0.0 2023-11-19 20:21:01,854 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.10 vs. limit=22.5 2023-11-19 20:21:02,566 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 10600, loss[loss=0.09518, simple_loss=0.1278, pruned_loss=0.02452, audio_tagging_loss=0.006754, over 15049.00 frames. ], tot_loss[loss=0.08524, simple_loss=0.1048, pruned_loss=0.02256, audio_tagging_loss=0.01027, over 3051090.55 frames. ], batch size: 56, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:21:05,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=792066.6666666666, ans=0.0 2023-11-19 20:21:20,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=792133.3333333334, ans=0.0 2023-11-19 20:21:22,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=792133.3333333334, ans=0.125 2023-11-19 20:21:28,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=792200.0, ans=0.125 2023-11-19 20:21:33,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=792200.0, ans=0.125 2023-11-19 20:21:33,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=792200.0, ans=0.125 2023-11-19 20:21:37,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=792200.0, ans=0.125 2023-11-19 20:21:38,216 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.354e+01 8.305e+01 8.737e+01 9.475e+01 2.195e+02, threshold=1.747e+02, percent-clipped=1.0 2023-11-19 20:21:51,984 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 118850 2023-11-19 20:22:07,716 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 10650, loss[loss=0.07088, simple_loss=0.08507, pruned_loss=0.01692, audio_tagging_loss=0.01143, over 13533.00 frames. ], tot_loss[loss=0.08471, simple_loss=0.1041, pruned_loss=0.02238, audio_tagging_loss=0.01028, over 3048760.75 frames. ], batch size: 53, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:22:08,240 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.41 vs. limit=22.5 2023-11-19 20:22:16,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=792400.0, ans=0.125 2023-11-19 20:22:27,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=792466.6666666666, ans=0.0 2023-11-19 20:22:31,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=792533.3333333334, ans=0.1 2023-11-19 20:22:39,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.94 vs. limit=15.0 2023-11-19 20:22:52,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=792600.0, ans=0.0 2023-11-19 20:22:57,326 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 118900 2023-11-19 20:23:03,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=792666.6666666666, ans=0.125 2023-11-19 20:23:06,315 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=15.0 2023-11-19 20:23:12,203 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 10700, loss[loss=0.07027, simple_loss=0.08513, pruned_loss=0.01657, audio_tagging_loss=0.01113, over 14489.00 frames. ], tot_loss[loss=0.08536, simple_loss=0.1052, pruned_loss=0.02255, audio_tagging_loss=0.01019, over 3042217.57 frames. ], batch size: 53, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:23:16,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=792733.3333333334, ans=0.0 2023-11-19 20:23:34,376 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.04 vs. limit=10.0 2023-11-19 20:23:38,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=792866.6666666666, ans=0.2 2023-11-19 20:23:38,653 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.67 vs. limit=22.5 2023-11-19 20:23:48,372 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.590e+01 8.344e+01 9.124e+01 9.874e+01 1.194e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 20:23:56,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=792933.3333333334, ans=0.2 2023-11-19 20:24:00,573 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 118950 2023-11-19 20:24:16,643 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 10750, loss[loss=0.09782, simple_loss=0.1223, pruned_loss=0.0263, audio_tagging_loss=0.01039, over 14976.00 frames. ], tot_loss[loss=0.08494, simple_loss=0.1048, pruned_loss=0.0224, audio_tagging_loss=0.01015, over 3049515.86 frames. ], batch size: 56, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:24:25,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=793066.6666666666, ans=0.1 2023-11-19 20:24:57,221 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:25:04,927 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 119000 2023-11-19 20:25:14,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=793333.3333333334, ans=0.125 2023-11-19 20:25:21,040 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 10800, loss[loss=0.09324, simple_loss=0.1227, pruned_loss=0.0224, audio_tagging_loss=0.009471, over 15456.00 frames. ], tot_loss[loss=0.08517, simple_loss=0.105, pruned_loss=0.02254, audio_tagging_loss=0.01015, over 3058227.32 frames. ], batch size: 56, lr: 6.79e-03, grad_scale: 32.0 2023-11-19 20:25:34,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=793466.6666666666, ans=0.0 2023-11-19 20:25:35,476 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2023-11-19 20:25:37,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=793466.6666666666, ans=0.2 2023-11-19 20:25:53,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=793533.3333333334, ans=0.0 2023-11-19 20:25:56,702 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.855e+01 8.324e+01 9.413e+01 1.037e+02 1.353e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-19 20:26:01,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=793600.0, ans=0.125 2023-11-19 20:26:05,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=793600.0, ans=0.125 2023-11-19 20:26:10,132 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 119050 2023-11-19 20:26:16,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=793666.6666666666, ans=0.125 2023-11-19 20:26:24,876 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 10850, loss[loss=0.08854, simple_loss=0.1079, pruned_loss=0.02629, audio_tagging_loss=0.008278, over 14847.00 frames. ], tot_loss[loss=0.08528, simple_loss=0.1051, pruned_loss=0.02257, audio_tagging_loss=0.01014, over 3058032.56 frames. ], batch size: 55, lr: 6.79e-03, grad_scale: 32.0 2023-11-19 20:26:26,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=793733.3333333334, ans=0.2 2023-11-19 20:26:34,005 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.20 vs. limit=15.0 2023-11-19 20:27:13,729 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 119100 2023-11-19 20:27:14,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=793933.3333333334, ans=0.125 2023-11-19 20:27:22,304 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:27:25,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=794000.0, ans=0.125 2023-11-19 20:27:28,420 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 10900, loss[loss=0.09845, simple_loss=0.124, pruned_loss=0.02787, audio_tagging_loss=0.008602, over 15825.00 frames. ], tot_loss[loss=0.08534, simple_loss=0.1052, pruned_loss=0.02257, audio_tagging_loss=0.01015, over 3063963.26 frames. ], batch size: 60, lr: 6.79e-03, grad_scale: 16.0 2023-11-19 20:27:39,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=794066.6666666666, ans=0.2 2023-11-19 20:27:56,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=794200.0, ans=0.0 2023-11-19 20:28:02,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=794200.0, ans=0.125 2023-11-19 20:28:05,860 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.516e+01 8.198e+01 8.697e+01 9.317e+01 1.364e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-19 20:28:07,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=794266.6666666666, ans=0.0 2023-11-19 20:28:09,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=794266.6666666666, ans=0.2 2023-11-19 20:28:16,980 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 119150 2023-11-19 20:28:23,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=794333.3333333334, ans=0.1 2023-11-19 20:28:26,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=794333.3333333334, ans=0.2 2023-11-19 20:28:33,886 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 10950, loss[loss=0.05609, simple_loss=0.06775, pruned_loss=0.01145, audio_tagging_loss=0.01078, over 14477.00 frames. ], tot_loss[loss=0.08449, simple_loss=0.1041, pruned_loss=0.02213, audio_tagging_loss=0.01033, over 3050913.28 frames. ], batch size: 55, lr: 6.79e-03, grad_scale: 16.0 2023-11-19 20:28:35,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=794400.0, ans=0.1 2023-11-19 20:28:40,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=794400.0, ans=0.025 2023-11-19 20:28:48,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=794466.6666666666, ans=0.125 2023-11-19 20:29:14,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=794600.0, ans=0.0 2023-11-19 20:29:23,130 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 119200 2023-11-19 20:29:23,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=794600.0, ans=0.125 2023-11-19 20:29:24,766 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.21 vs. limit=10.0 2023-11-19 20:29:33,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=794666.6666666666, ans=15.0 2023-11-19 20:29:35,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=794666.6666666666, ans=0.2 2023-11-19 20:29:37,909 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 11000, loss[loss=0.08672, simple_loss=0.1188, pruned_loss=0.02027, audio_tagging_loss=0.007029, over 14700.00 frames. ], tot_loss[loss=0.08462, simple_loss=0.1044, pruned_loss=0.02209, audio_tagging_loss=0.01035, over 3045890.14 frames. ], batch size: 54, lr: 6.79e-03, grad_scale: 16.0 2023-11-19 20:29:41,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=794733.3333333334, ans=0.125 2023-11-19 20:29:46,431 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:29:54,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=794800.0, ans=0.0 2023-11-19 20:30:03,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=794866.6666666666, ans=0.125 2023-11-19 20:30:04,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=794866.6666666666, ans=0.0 2023-11-19 20:30:12,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=794866.6666666666, ans=0.0 2023-11-19 20:30:15,774 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.738e+01 8.172e+01 9.020e+01 9.721e+01 1.365e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 20:30:27,061 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 119250 2023-11-19 20:30:31,393 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=12.0 2023-11-19 20:30:34,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=795000.0, ans=0.0 2023-11-19 20:30:38,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=795000.0, ans=0.2 2023-11-19 20:30:41,675 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 11050, loss[loss=0.08032, simple_loss=0.09475, pruned_loss=0.02256, audio_tagging_loss=0.01038, over 15722.00 frames. ], tot_loss[loss=0.08517, simple_loss=0.1047, pruned_loss=0.02241, audio_tagging_loss=0.01043, over 3043093.52 frames. ], batch size: 60, lr: 6.79e-03, grad_scale: 16.0 2023-11-19 20:31:07,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.00 vs. limit=12.0 2023-11-19 20:31:12,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=795200.0, ans=0.125 2023-11-19 20:31:13,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=795200.0, ans=0.0 2023-11-19 20:31:17,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=795200.0, ans=0.2 2023-11-19 20:31:29,788 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 119300 2023-11-19 20:31:30,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=795266.6666666666, ans=0.07 2023-11-19 20:31:33,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=795333.3333333334, ans=0.125 2023-11-19 20:31:44,859 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 11100, loss[loss=0.09218, simple_loss=0.112, pruned_loss=0.02515, audio_tagging_loss=0.01106, over 14616.00 frames. ], tot_loss[loss=0.08474, simple_loss=0.1036, pruned_loss=0.02228, audio_tagging_loss=0.01066, over 3041186.88 frames. ], batch size: 56, lr: 6.79e-03, grad_scale: 16.0 2023-11-19 20:32:14,930 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.60 vs. limit=15.0 2023-11-19 20:32:16,001 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-11-19 20:32:21,492 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.097e+01 8.553e+01 9.407e+01 1.079e+02 1.400e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-19 20:32:21,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=795600.0, ans=0.125 2023-11-19 20:32:33,125 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 119350 2023-11-19 20:32:46,551 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.59 vs. limit=15.0 2023-11-19 20:32:49,471 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 11150, loss[loss=0.09582, simple_loss=0.127, pruned_loss=0.02311, audio_tagging_loss=0.009216, over 16190.00 frames. ], tot_loss[loss=0.08425, simple_loss=0.1028, pruned_loss=0.02209, audio_tagging_loss=0.01078, over 3046003.81 frames. ], batch size: 57, lr: 6.78e-03, grad_scale: 16.0 2023-11-19 20:32:59,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=795733.3333333334, ans=0.1 2023-11-19 20:33:18,285 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.86 vs. limit=22.5 2023-11-19 20:33:23,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=795866.6666666666, ans=0.1 2023-11-19 20:33:25,357 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.17 vs. limit=10.0 2023-11-19 20:33:37,739 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 119400 2023-11-19 20:33:52,403 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 11200, loss[loss=0.09037, simple_loss=0.1207, pruned_loss=0.0217, audio_tagging_loss=0.00832, over 14843.00 frames. ], tot_loss[loss=0.08381, simple_loss=0.102, pruned_loss=0.02192, audio_tagging_loss=0.01089, over 3044563.96 frames. ], batch size: 56, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:33:55,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=796066.6666666666, ans=0.2 2023-11-19 20:33:55,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=796066.6666666666, ans=0.125 2023-11-19 20:34:00,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=796066.6666666666, ans=0.0 2023-11-19 20:34:00,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=796066.6666666666, ans=0.2 2023-11-19 20:34:18,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=796200.0, ans=0.125 2023-11-19 20:34:29,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=796200.0, ans=0.07 2023-11-19 20:34:30,424 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.715e+01 8.063e+01 8.813e+01 9.513e+01 1.484e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-19 20:34:41,334 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 119450 2023-11-19 20:34:48,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=796333.3333333334, ans=0.1 2023-11-19 20:34:56,374 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 11250, loss[loss=0.09952, simple_loss=0.1231, pruned_loss=0.02957, audio_tagging_loss=0.008418, over 14452.00 frames. ], tot_loss[loss=0.08337, simple_loss=0.1015, pruned_loss=0.02175, audio_tagging_loss=0.01087, over 3040437.93 frames. ], batch size: 52, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:35:27,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=796533.3333333334, ans=0.1 2023-11-19 20:35:29,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=796533.3333333334, ans=0.0 2023-11-19 20:35:30,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=796533.3333333334, ans=0.125 2023-11-19 20:35:45,347 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 119500 2023-11-19 20:35:55,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=796666.6666666666, ans=0.125 2023-11-19 20:36:01,653 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 11300, loss[loss=0.09542, simple_loss=0.1103, pruned_loss=0.02671, audio_tagging_loss=0.01358, over 14877.00 frames. ], tot_loss[loss=0.08389, simple_loss=0.1026, pruned_loss=0.02191, audio_tagging_loss=0.01067, over 3048589.76 frames. ], batch size: 55, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:36:05,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=796733.3333333334, ans=0.125 2023-11-19 20:36:14,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=796800.0, ans=0.0 2023-11-19 20:36:17,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=796800.0, ans=0.125 2023-11-19 20:36:38,562 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.085e+01 8.176e+01 8.864e+01 9.663e+01 1.255e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-19 20:36:43,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=796933.3333333334, ans=0.1 2023-11-19 20:36:50,742 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 119550 2023-11-19 20:37:05,287 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 11350, loss[loss=0.07742, simple_loss=0.105, pruned_loss=0.01676, audio_tagging_loss=0.008165, over 14855.00 frames. ], tot_loss[loss=0.08295, simple_loss=0.1014, pruned_loss=0.02168, audio_tagging_loss=0.01055, over 3046693.46 frames. ], batch size: 55, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:37:08,345 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.10 vs. limit=22.5 2023-11-19 20:37:10,551 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.89 vs. limit=10.0 2023-11-19 20:37:10,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.15 vs. limit=15.0 2023-11-19 20:37:12,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=797066.6666666666, ans=0.0 2023-11-19 20:37:52,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=797266.6666666666, ans=0.125 2023-11-19 20:37:54,007 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 119600 2023-11-19 20:38:01,062 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.08 vs. limit=10.0 2023-11-19 20:38:08,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=797400.0, ans=0.0 2023-11-19 20:38:09,486 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 11400, loss[loss=0.09709, simple_loss=0.1246, pruned_loss=0.02632, audio_tagging_loss=0.008476, over 15317.00 frames. ], tot_loss[loss=0.08352, simple_loss=0.1027, pruned_loss=0.02184, audio_tagging_loss=0.01034, over 3051433.56 frames. ], batch size: 57, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:38:16,787 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=12.0 2023-11-19 20:38:27,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=797466.6666666666, ans=0.1 2023-11-19 20:38:45,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=797533.3333333334, ans=0.0 2023-11-19 20:38:46,573 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.427e+01 8.335e+01 9.100e+01 1.007e+02 1.269e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-19 20:38:49,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=797600.0, ans=0.05 2023-11-19 20:38:58,310 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 119650 2023-11-19 20:39:00,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=797666.6666666666, ans=0.2 2023-11-19 20:39:14,318 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 11450, loss[loss=0.06151, simple_loss=0.06535, pruned_loss=0.01631, audio_tagging_loss=0.01252, over 14234.00 frames. ], tot_loss[loss=0.08374, simple_loss=0.103, pruned_loss=0.02204, audio_tagging_loss=0.0102, over 3052508.53 frames. ], batch size: 55, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:39:32,661 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.85 vs. limit=22.5 2023-11-19 20:39:38,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=15.0 2023-11-19 20:39:56,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=797933.3333333334, ans=0.07 2023-11-19 20:40:03,128 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 119700 2023-11-19 20:40:09,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=798000.0, ans=0.0 2023-11-19 20:40:09,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=798000.0, ans=0.125 2023-11-19 20:40:15,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=798000.0, ans=0.125 2023-11-19 20:40:18,265 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 11500, loss[loss=0.08975, simple_loss=0.1009, pruned_loss=0.02743, audio_tagging_loss=0.01188, over 15529.00 frames. ], tot_loss[loss=0.08311, simple_loss=0.102, pruned_loss=0.02182, audio_tagging_loss=0.01032, over 3052429.86 frames. ], batch size: 57, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:40:30,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=798133.3333333334, ans=0.0 2023-11-19 20:40:31,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2023-11-19 20:40:39,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=798133.3333333334, ans=0.1 2023-11-19 20:40:41,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=798133.3333333334, ans=0.125 2023-11-19 20:40:42,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=798200.0, ans=0.125 2023-11-19 20:40:55,533 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.252e+01 9.046e+01 9.695e+01 1.384e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-19 20:41:03,006 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.42 vs. limit=22.5 2023-11-19 20:41:07,293 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 119750 2023-11-19 20:41:07,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=798266.6666666666, ans=0.125 2023-11-19 20:41:22,489 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 11550, loss[loss=0.07131, simple_loss=0.07876, pruned_loss=0.02251, audio_tagging_loss=0.009428, over 15771.00 frames. ], tot_loss[loss=0.08338, simple_loss=0.1022, pruned_loss=0.02195, audio_tagging_loss=0.01033, over 3060929.02 frames. ], batch size: 61, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:41:34,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=798466.6666666666, ans=0.0 2023-11-19 20:41:37,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=798466.6666666666, ans=0.125 2023-11-19 20:41:40,016 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.04 vs. limit=10.0 2023-11-19 20:41:55,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=798533.3333333334, ans=0.125 2023-11-19 20:41:56,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=798533.3333333334, ans=0.2 2023-11-19 20:41:58,364 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:42:05,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=798600.0, ans=0.0 2023-11-19 20:42:11,172 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 119800 2023-11-19 20:42:13,005 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=15.0 2023-11-19 20:42:19,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2023-11-19 20:42:27,333 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 11600, loss[loss=0.09414, simple_loss=0.115, pruned_loss=0.02826, audio_tagging_loss=0.008366, over 15846.00 frames. ], tot_loss[loss=0.08449, simple_loss=0.1039, pruned_loss=0.02231, audio_tagging_loss=0.01024, over 3064227.63 frames. ], batch size: 58, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:42:33,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=798733.3333333334, ans=0.2 2023-11-19 20:42:44,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=798800.0, ans=0.2 2023-11-19 20:42:46,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=798800.0, ans=0.125 2023-11-19 20:43:04,738 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.840e+01 8.658e+01 9.480e+01 1.096e+02 1.560e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-19 20:43:15,997 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 119850 2023-11-19 20:43:31,189 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 11650, loss[loss=0.07417, simple_loss=0.09352, pruned_loss=0.01662, audio_tagging_loss=0.01079, over 16266.00 frames. ], tot_loss[loss=0.08402, simple_loss=0.1034, pruned_loss=0.02214, audio_tagging_loss=0.01019, over 3061534.44 frames. ], batch size: 60, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:43:33,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=799066.6666666666, ans=0.0 2023-11-19 20:43:44,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=799133.3333333334, ans=0.125 2023-11-19 20:43:47,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=799133.3333333334, ans=0.0 2023-11-19 20:43:53,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=799133.3333333334, ans=0.125 2023-11-19 20:44:12,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=799266.6666666666, ans=0.125 2023-11-19 20:44:15,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=799266.6666666666, ans=0.1 2023-11-19 20:44:19,806 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 119900 2023-11-19 20:44:30,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=799333.3333333334, ans=0.125 2023-11-19 20:44:35,535 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 11700, loss[loss=0.0865, simple_loss=0.09181, pruned_loss=0.02534, audio_tagging_loss=0.01526, over 14785.00 frames. ], tot_loss[loss=0.08422, simple_loss=0.1036, pruned_loss=0.02219, audio_tagging_loss=0.01023, over 3058376.35 frames. ], batch size: 56, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:44:37,525 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.48 vs. limit=15.0 2023-11-19 20:45:14,282 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.866e+01 8.190e+01 9.042e+01 9.907e+01 1.390e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 20:45:20,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=799600.0, ans=0.0 2023-11-19 20:45:24,730 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 119950 2023-11-19 20:45:31,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=799666.6666666666, ans=0.125 2023-11-19 20:45:40,600 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 11750, loss[loss=0.08743, simple_loss=0.1042, pruned_loss=0.02269, audio_tagging_loss=0.01262, over 15005.00 frames. ], tot_loss[loss=0.08462, simple_loss=0.1037, pruned_loss=0.02239, audio_tagging_loss=0.01036, over 3052668.36 frames. ], batch size: 57, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:45:43,820 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.76 vs. limit=22.5 2023-11-19 20:46:07,674 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=12.0 2023-11-19 20:46:09,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=799866.6666666666, ans=0.2 2023-11-19 20:46:16,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=799866.6666666666, ans=0.125 2023-11-19 20:46:20,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=799933.3333333334, ans=0.05 2023-11-19 20:46:29,890 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 120000 2023-11-19 20:46:35,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=800000.0, ans=0.07 2023-11-19 20:46:40,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=800000.0, ans=0.125 2023-11-19 20:46:47,831 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 11800, loss[loss=0.09238, simple_loss=0.1013, pruned_loss=0.02957, audio_tagging_loss=0.01217, over 15216.00 frames. ], tot_loss[loss=0.08526, simple_loss=0.1046, pruned_loss=0.02265, audio_tagging_loss=0.01033, over 3046096.73 frames. ], batch size: 61, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:47:01,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=800133.3333333334, ans=0.0 2023-11-19 20:47:26,908 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.456e+01 9.093e+01 9.839e+01 1.192e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-19 20:47:28,745 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.22 vs. limit=22.5 2023-11-19 20:47:34,552 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:47:36,800 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 120050 2023-11-19 20:47:43,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=800333.3333333334, ans=0.2 2023-11-19 20:47:51,932 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 11850, loss[loss=0.1073, simple_loss=0.1234, pruned_loss=0.03054, audio_tagging_loss=0.01501, over 14564.00 frames. ], tot_loss[loss=0.08577, simple_loss=0.1051, pruned_loss=0.02277, audio_tagging_loss=0.01045, over 3047095.41 frames. ], batch size: 54, lr: 6.76e-03, grad_scale: 32.0 2023-11-19 20:47:54,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=800400.0, ans=15.0 2023-11-19 20:48:13,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=800466.6666666666, ans=0.2 2023-11-19 20:48:41,687 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 120100 2023-11-19 20:48:45,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=800666.6666666666, ans=0.1 2023-11-19 20:48:45,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=800666.6666666666, ans=0.125 2023-11-19 20:48:57,227 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 11900, loss[loss=0.09608, simple_loss=0.1209, pruned_loss=0.02609, audio_tagging_loss=0.009563, over 16958.00 frames. ], tot_loss[loss=0.08482, simple_loss=0.1039, pruned_loss=0.02232, audio_tagging_loss=0.01057, over 3062130.13 frames. ], batch size: 64, lr: 6.76e-03, grad_scale: 32.0 2023-11-19 20:49:07,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=800733.3333333334, ans=0.1 2023-11-19 20:49:35,950 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.865e+01 8.251e+01 9.063e+01 9.820e+01 1.973e+02, threshold=1.813e+02, percent-clipped=1.0 2023-11-19 20:49:37,937 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.15 vs. limit=15.0 2023-11-19 20:49:45,800 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 120150 2023-11-19 20:49:47,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=801000.0, ans=0.125 2023-11-19 20:49:52,398 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.10 vs. limit=10.0 2023-11-19 20:49:54,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=801000.0, ans=0.125 2023-11-19 20:50:00,551 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 11950, loss[loss=0.08957, simple_loss=0.1202, pruned_loss=0.02314, audio_tagging_loss=0.00635, over 13941.00 frames. ], tot_loss[loss=0.08436, simple_loss=0.103, pruned_loss=0.02221, audio_tagging_loss=0.01065, over 3054194.25 frames. ], batch size: 52, lr: 6.76e-03, grad_scale: 16.0 2023-11-19 20:50:01,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=801066.6666666666, ans=0.04949747468305833 2023-11-19 20:50:24,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=801133.3333333334, ans=0.07 2023-11-19 20:50:36,043 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.76 vs. limit=15.0 2023-11-19 20:50:47,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=801266.6666666666, ans=0.0 2023-11-19 20:50:48,464 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 120200 2023-11-19 20:50:53,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=801333.3333333334, ans=0.125 2023-11-19 20:50:54,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=801333.3333333334, ans=0.1 2023-11-19 20:51:02,530 INFO [train_asr.py:1262] (3/4) Epoch 10, batch 12000, loss[loss=0.06971, simple_loss=0.07587, pruned_loss=0.01951, audio_tagging_loss=0.01227, over 15477.00 frames. ], tot_loss[loss=0.08466, simple_loss=0.1034, pruned_loss=0.0223, audio_tagging_loss=0.01067, over 3060152.08 frames. ], batch size: 61, lr: 6.76e-03, grad_scale: 32.0 2023-11-19 20:51:02,531 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-19 20:51:26,516 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.7086, 4.1624, 3.5924, 2.9244], device='cuda:3') 2023-11-19 20:51:41,785 INFO [train_asr.py:1294] (3/4) Epoch 10, validation: loss=0.06456, simple_loss=0.05518, pruned_loss=0.006322, audio_tagging_loss=0.03065, over 4681554.00 frames. 2023-11-19 20:51:41,786 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-19 20:51:49,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=801400.0, ans=0.125 2023-11-19 20:51:59,395 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:52:02,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=801466.6666666666, ans=0.09899494936611666 2023-11-19 20:52:44,663 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 0, loss[loss=0.1068, simple_loss=0.1264, pruned_loss=0.02201, audio_tagging_loss=0.02153, over 16612.00 frames. ], tot_loss[loss=0.1068, simple_loss=0.1264, pruned_loss=0.02201, audio_tagging_loss=0.02153, over 16612.00 frames. ], batch size: 61, lr: 6.45e-03, grad_scale: 32.0 2023-11-19 20:52:44,664 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-19 20:53:06,956 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.5501, 3.6541, 4.3123, 3.3045], device='cuda:3') 2023-11-19 20:53:19,992 INFO [train_asr.py:1294] (3/4) Epoch 11, validation: loss=0.06409, simple_loss=0.05518, pruned_loss=0.006264, audio_tagging_loss=0.03024, over 4681554.00 frames. 2023-11-19 20:53:19,993 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-19 20:53:32,239 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.455e+01 8.493e+01 9.059e+01 9.664e+01 1.642e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-19 20:53:32,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=801606.6666666666, ans=0.2 2023-11-19 20:53:39,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=801606.6666666666, ans=0.0 2023-11-19 20:53:41,126 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 120250 2023-11-19 20:53:41,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=801606.6666666666, ans=0.125 2023-11-19 20:53:48,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=801673.3333333334, ans=0.125 2023-11-19 20:53:56,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=801673.3333333334, ans=0.0 2023-11-19 20:53:58,825 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:54:03,969 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2023-11-19 20:54:07,460 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:54:14,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=801806.6666666666, ans=0.125 2023-11-19 20:54:17,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=801806.6666666666, ans=0.0 2023-11-19 20:54:24,288 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 50, loss[loss=0.08236, simple_loss=0.08797, pruned_loss=0.01567, audio_tagging_loss=0.02271, over 14441.00 frames. ], tot_loss[loss=0.09549, simple_loss=0.1058, pruned_loss=0.02244, audio_tagging_loss=0.02014, over 690073.81 frames. ], batch size: 54, lr: 6.45e-03, grad_scale: 32.0 2023-11-19 20:54:30,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=801873.3333333334, ans=0.0 2023-11-19 20:54:34,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=801873.3333333334, ans=0.125 2023-11-19 20:54:46,559 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 120300 2023-11-19 20:54:53,586 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.69 vs. limit=15.0 2023-11-19 20:55:01,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=802006.6666666666, ans=0.1 2023-11-19 20:55:17,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=802140.0, ans=0.2 2023-11-19 20:55:29,999 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 100, loss[loss=0.0887, simple_loss=0.09799, pruned_loss=0.02188, audio_tagging_loss=0.01783, over 15289.00 frames. ], tot_loss[loss=0.09406, simple_loss=0.1044, pruned_loss=0.02245, audio_tagging_loss=0.01941, over 1203043.52 frames. ], batch size: 58, lr: 6.45e-03, grad_scale: 32.0 2023-11-19 20:55:44,015 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.071e+01 8.908e+01 9.605e+01 1.032e+02 1.207e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-19 20:55:52,796 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 120350 2023-11-19 20:56:00,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=802340.0, ans=0.2 2023-11-19 20:56:19,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=802406.6666666666, ans=0.0 2023-11-19 20:56:35,847 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 150, loss[loss=0.09543, simple_loss=0.1135, pruned_loss=0.02349, audio_tagging_loss=0.01519, over 16329.00 frames. ], tot_loss[loss=0.09068, simple_loss=0.103, pruned_loss=0.02203, audio_tagging_loss=0.01718, over 1607449.52 frames. ], batch size: 59, lr: 6.45e-03, grad_scale: 32.0 2023-11-19 20:56:41,845 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.94 vs. limit=10.0 2023-11-19 20:56:57,464 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 120400 2023-11-19 20:56:57,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=802606.6666666666, ans=0.025 2023-11-19 20:57:02,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=802673.3333333334, ans=0.2 2023-11-19 20:57:06,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=802673.3333333334, ans=0.0 2023-11-19 20:57:12,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=802673.3333333334, ans=0.125 2023-11-19 20:57:16,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=802740.0, ans=0.125 2023-11-19 20:57:19,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=802740.0, ans=0.0 2023-11-19 20:57:31,640 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.02 vs. limit=15.0 2023-11-19 20:57:38,079 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.35 vs. limit=15.0 2023-11-19 20:57:40,924 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 200, loss[loss=0.07716, simple_loss=0.0937, pruned_loss=0.02088, audio_tagging_loss=0.009429, over 16036.00 frames. ], tot_loss[loss=0.09018, simple_loss=0.1052, pruned_loss=0.02254, audio_tagging_loss=0.01504, over 1925242.54 frames. ], batch size: 59, lr: 6.45e-03, grad_scale: 32.0 2023-11-19 20:57:54,017 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.513e+01 8.370e+01 8.919e+01 1.001e+02 1.772e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-19 20:58:02,709 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 120450 2023-11-19 20:58:04,778 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=9.975e-02 2023-11-19 20:58:29,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=803073.3333333334, ans=0.035 2023-11-19 20:58:40,042 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.43 vs. limit=10.0 2023-11-19 20:58:40,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=803140.0, ans=0.2 2023-11-19 20:58:46,013 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 250, loss[loss=0.07255, simple_loss=0.08787, pruned_loss=0.01883, audio_tagging_loss=0.009792, over 15478.00 frames. ], tot_loss[loss=0.08858, simple_loss=0.1049, pruned_loss=0.02252, audio_tagging_loss=0.0136, over 2179188.87 frames. ], batch size: 59, lr: 6.45e-03, grad_scale: 16.0 2023-11-19 20:58:52,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=803206.6666666666, ans=0.125 2023-11-19 20:58:58,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=803273.3333333334, ans=0.1 2023-11-19 20:59:02,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=803273.3333333334, ans=0.125 2023-11-19 20:59:07,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=803273.3333333334, ans=22.5 2023-11-19 20:59:09,461 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 120500 2023-11-19 20:59:30,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=803406.6666666666, ans=0.0 2023-11-19 20:59:52,079 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 300, loss[loss=0.05693, simple_loss=0.06349, pruned_loss=0.0135, audio_tagging_loss=0.01168, over 14850.00 frames. ], tot_loss[loss=0.08615, simple_loss=0.1032, pruned_loss=0.0219, audio_tagging_loss=0.01263, over 2374416.67 frames. ], batch size: 56, lr: 6.45e-03, grad_scale: 16.0 2023-11-19 20:59:58,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=803540.0, ans=0.1 2023-11-19 21:00:05,708 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.754e+01 8.387e+01 8.949e+01 9.814e+01 1.274e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-19 21:00:13,396 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 120550 2023-11-19 21:00:52,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=803806.6666666666, ans=0.1 2023-11-19 21:00:53,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=803806.6666666666, ans=0.125 2023-11-19 21:00:56,081 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 350, loss[loss=0.06163, simple_loss=0.06708, pruned_loss=0.01495, audio_tagging_loss=0.01313, over 15624.00 frames. ], tot_loss[loss=0.08551, simple_loss=0.1031, pruned_loss=0.02189, audio_tagging_loss=0.01209, over 2535635.25 frames. ], batch size: 60, lr: 6.44e-03, grad_scale: 16.0 2023-11-19 21:01:03,093 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-19 21:01:08,917 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.294e-02 2023-11-19 21:01:14,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=803940.0, ans=0.0 2023-11-19 21:01:17,858 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 120600 2023-11-19 21:01:22,537 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2023-11-19 21:01:40,255 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.48 vs. limit=10.0 2023-11-19 21:01:53,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=804140.0, ans=0.125 2023-11-19 21:02:01,517 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 400, loss[loss=0.09859, simple_loss=0.1276, pruned_loss=0.02858, audio_tagging_loss=0.006205, over 14688.00 frames. ], tot_loss[loss=0.08591, simple_loss=0.1043, pruned_loss=0.02228, audio_tagging_loss=0.01148, over 2650649.22 frames. ], batch size: 54, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:02:14,626 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.48 vs. limit=15.0 2023-11-19 21:02:15,709 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.819e+01 8.262e+01 8.810e+01 9.660e+01 1.540e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-19 21:02:24,539 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 120650 2023-11-19 21:02:28,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=804340.0, ans=0.125 2023-11-19 21:02:56,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=804473.3333333334, ans=0.125 2023-11-19 21:03:02,823 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.43 vs. limit=15.0 2023-11-19 21:03:06,999 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 450, loss[loss=0.07058, simple_loss=0.08319, pruned_loss=0.01837, audio_tagging_loss=0.01062, over 14436.00 frames. ], tot_loss[loss=0.08528, simple_loss=0.1036, pruned_loss=0.02222, audio_tagging_loss=0.01125, over 2734500.18 frames. ], batch size: 56, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:03:10,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=804540.0, ans=0.125 2023-11-19 21:03:28,985 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 120700 2023-11-19 21:03:36,914 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.66 vs. limit=22.5 2023-11-19 21:03:38,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=804673.3333333334, ans=0.0 2023-11-19 21:04:12,487 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 500, loss[loss=0.09053, simple_loss=0.1121, pruned_loss=0.02518, audio_tagging_loss=0.009277, over 14859.00 frames. ], tot_loss[loss=0.08498, simple_loss=0.1035, pruned_loss=0.0222, audio_tagging_loss=0.01106, over 2809873.93 frames. ], batch size: 56, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:04:17,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=804873.3333333334, ans=0.2 2023-11-19 21:04:21,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=804873.3333333334, ans=0.0 2023-11-19 21:04:26,061 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.498e+01 9.308e+01 1.026e+02 1.855e+02, threshold=1.862e+02, percent-clipped=1.0 2023-11-19 21:04:34,113 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 120750 2023-11-19 21:04:59,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=805073.3333333334, ans=0.125 2023-11-19 21:05:11,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=805140.0, ans=0.125 2023-11-19 21:05:14,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=805140.0, ans=10.0 2023-11-19 21:05:16,334 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 550, loss[loss=0.06615, simple_loss=0.07943, pruned_loss=0.01555, audio_tagging_loss=0.01089, over 15951.00 frames. ], tot_loss[loss=0.08515, simple_loss=0.1038, pruned_loss=0.02234, audio_tagging_loss=0.01091, over 2861967.32 frames. ], batch size: 60, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:05:16,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=805206.6666666666, ans=0.125 2023-11-19 21:05:20,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=805206.6666666666, ans=0.125 2023-11-19 21:05:22,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=805206.6666666666, ans=0.125 2023-11-19 21:05:23,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2023-11-19 21:05:32,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=805273.3333333334, ans=0.125 2023-11-19 21:05:36,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=805273.3333333334, ans=0.2 2023-11-19 21:05:39,032 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 120800 2023-11-19 21:05:39,776 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.35 vs. limit=15.0 2023-11-19 21:05:43,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=805340.0, ans=0.125 2023-11-19 21:06:21,490 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 600, loss[loss=0.09063, simple_loss=0.1154, pruned_loss=0.02492, audio_tagging_loss=0.00803, over 14815.00 frames. ], tot_loss[loss=0.08432, simple_loss=0.103, pruned_loss=0.022, audio_tagging_loss=0.0108, over 2898946.97 frames. ], batch size: 57, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:06:30,802 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.94 vs. limit=12.0 2023-11-19 21:06:35,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=805606.6666666666, ans=0.1 2023-11-19 21:06:36,962 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.892e+01 8.276e+01 9.038e+01 9.770e+01 1.365e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 21:06:39,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=805606.6666666666, ans=0.0 2023-11-19 21:06:44,468 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 120850 2023-11-19 21:06:59,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=805740.0, ans=0.07 2023-11-19 21:07:02,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=805740.0, ans=0.0 2023-11-19 21:07:16,568 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=5.05 vs. limit=5.0 2023-11-19 21:07:25,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=805873.3333333334, ans=0.125 2023-11-19 21:07:27,202 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 650, loss[loss=0.07371, simple_loss=0.09764, pruned_loss=0.01645, audio_tagging_loss=0.008438, over 15647.00 frames. ], tot_loss[loss=0.0842, simple_loss=0.103, pruned_loss=0.02196, audio_tagging_loss=0.01075, over 2926359.64 frames. ], batch size: 58, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:07:48,667 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 120900 2023-11-19 21:07:50,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=805940.0, ans=0.125 2023-11-19 21:07:53,120 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.99 vs. limit=15.0 2023-11-19 21:08:29,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=806206.6666666666, ans=0.2 2023-11-19 21:08:30,886 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 700, loss[loss=0.08289, simple_loss=0.1009, pruned_loss=0.02127, audio_tagging_loss=0.01118, over 15567.00 frames. ], tot_loss[loss=0.08385, simple_loss=0.1028, pruned_loss=0.02181, audio_tagging_loss=0.01064, over 2950304.97 frames. ], batch size: 58, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:08:38,116 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.25 vs. limit=22.5 2023-11-19 21:08:44,804 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.115e+01 8.069e+01 8.585e+01 9.544e+01 1.162e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-19 21:08:53,532 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 120950 2023-11-19 21:09:15,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=806406.6666666666, ans=0.1 2023-11-19 21:09:35,854 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 750, loss[loss=0.08941, simple_loss=0.1113, pruned_loss=0.02242, audio_tagging_loss=0.01133, over 15827.00 frames. ], tot_loss[loss=0.08447, simple_loss=0.104, pruned_loss=0.02183, audio_tagging_loss=0.01066, over 2980916.70 frames. ], batch size: 58, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:09:36,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=806540.0, ans=0.1 2023-11-19 21:09:46,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=806540.0, ans=0.125 2023-11-19 21:09:50,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=806606.6666666666, ans=0.0 2023-11-19 21:09:58,101 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 121000 2023-11-19 21:10:00,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=806606.6666666666, ans=0.125 2023-11-19 21:10:04,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=806673.3333333334, ans=0.2 2023-11-19 21:10:20,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=806740.0, ans=0.125 2023-11-19 21:10:25,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=806740.0, ans=0.1 2023-11-19 21:10:34,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=806806.6666666666, ans=0.125 2023-11-19 21:10:37,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=806806.6666666666, ans=0.125 2023-11-19 21:10:40,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=806873.3333333334, ans=0.125 2023-11-19 21:10:41,370 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 800, loss[loss=0.06224, simple_loss=0.06692, pruned_loss=0.01143, audio_tagging_loss=0.01735, over 15742.00 frames. ], tot_loss[loss=0.08545, simple_loss=0.1056, pruned_loss=0.02206, audio_tagging_loss=0.01061, over 3003454.35 frames. ], batch size: 60, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:10:55,544 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.953e+01 8.303e+01 9.154e+01 9.871e+01 1.410e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 21:11:01,036 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.58 vs. limit=10.0 2023-11-19 21:11:02,904 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 121050 2023-11-19 21:11:27,479 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.11 vs. limit=22.5 2023-11-19 21:11:34,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=807140.0, ans=0.125 2023-11-19 21:11:45,781 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 850, loss[loss=0.06386, simple_loss=0.06685, pruned_loss=0.01362, audio_tagging_loss=0.01681, over 14798.00 frames. ], tot_loss[loss=0.08606, simple_loss=0.106, pruned_loss=0.02239, audio_tagging_loss=0.01066, over 3017043.13 frames. ], batch size: 56, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:11:49,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=807206.6666666666, ans=0.2 2023-11-19 21:12:07,857 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 121100 2023-11-19 21:12:20,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=807340.0, ans=0.5 2023-11-19 21:12:50,508 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 900, loss[loss=0.07258, simple_loss=0.08973, pruned_loss=0.01591, audio_tagging_loss=0.0118, over 15197.00 frames. ], tot_loss[loss=0.08573, simple_loss=0.1055, pruned_loss=0.02223, audio_tagging_loss=0.01074, over 3024379.86 frames. ], batch size: 58, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:13:05,610 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 8.167e+01 8.792e+01 9.769e+01 1.364e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-19 21:13:07,413 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2023-11-19 21:13:13,297 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 121150 2023-11-19 21:13:52,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.10 vs. limit=15.0 2023-11-19 21:13:56,698 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 950, loss[loss=0.09117, simple_loss=0.1147, pruned_loss=0.023, audio_tagging_loss=0.01082, over 14762.00 frames. ], tot_loss[loss=0.08605, simple_loss=0.1062, pruned_loss=0.02243, audio_tagging_loss=0.01052, over 3031943.53 frames. ], batch size: 56, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:14:03,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=807873.3333333334, ans=0.125 2023-11-19 21:14:04,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=807873.3333333334, ans=0.0 2023-11-19 21:14:18,441 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 121200 2023-11-19 21:14:30,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=808006.6666666666, ans=0.125 2023-11-19 21:15:01,038 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 1000, loss[loss=0.0624, simple_loss=0.0764, pruned_loss=0.01387, audio_tagging_loss=0.01033, over 14697.00 frames. ], tot_loss[loss=0.08577, simple_loss=0.1062, pruned_loss=0.02235, audio_tagging_loss=0.01032, over 3036072.86 frames. ], batch size: 56, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:15:11,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=808206.6666666666, ans=0.125 2023-11-19 21:15:15,837 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.437e+01 8.149e+01 8.966e+01 9.862e+01 1.248e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 21:15:23,332 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 121250 2023-11-19 21:15:29,315 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:15:33,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=808340.0, ans=0.125 2023-11-19 21:15:39,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=808406.6666666666, ans=0.125 2023-11-19 21:16:00,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=808473.3333333334, ans=10.0 2023-11-19 21:16:02,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=808473.3333333334, ans=0.2 2023-11-19 21:16:05,938 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 1050, loss[loss=0.1004, simple_loss=0.1168, pruned_loss=0.03149, audio_tagging_loss=0.01055, over 16122.00 frames. ], tot_loss[loss=0.08511, simple_loss=0.1054, pruned_loss=0.02217, audio_tagging_loss=0.01023, over 3029518.64 frames. ], batch size: 59, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:16:19,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=808606.6666666666, ans=0.125 2023-11-19 21:16:24,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=808606.6666666666, ans=0.125 2023-11-19 21:16:28,206 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 121300 2023-11-19 21:16:28,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=808606.6666666666, ans=0.125 2023-11-19 21:17:07,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=808806.6666666666, ans=0.125 2023-11-19 21:17:09,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=808806.6666666666, ans=0.07 2023-11-19 21:17:11,264 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 1100, loss[loss=0.06887, simple_loss=0.08357, pruned_loss=0.01765, audio_tagging_loss=0.009437, over 14851.00 frames. ], tot_loss[loss=0.08458, simple_loss=0.1044, pruned_loss=0.02216, audio_tagging_loss=0.01023, over 3028901.49 frames. ], batch size: 55, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:17:13,671 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:17:14,255 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.71 vs. limit=15.0 2023-11-19 21:17:23,801 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:17:25,898 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.226e+01 9.036e+01 9.842e+01 1.440e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-19 21:17:32,120 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 121350 2023-11-19 21:17:38,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=809006.6666666666, ans=0.125 2023-11-19 21:17:41,433 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:17:45,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=809006.6666666666, ans=0.1 2023-11-19 21:17:46,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=809006.6666666666, ans=0.125 2023-11-19 21:17:46,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=809006.6666666666, ans=0.0 2023-11-19 21:17:46,732 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=12.0 2023-11-19 21:18:10,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=809140.0, ans=0.0 2023-11-19 21:18:14,533 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 1150, loss[loss=0.08006, simple_loss=0.1081, pruned_loss=0.0199, audio_tagging_loss=0.006115, over 16188.00 frames. ], tot_loss[loss=0.08427, simple_loss=0.104, pruned_loss=0.02216, audio_tagging_loss=0.01012, over 3031167.27 frames. ], batch size: 62, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:18:29,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=809273.3333333334, ans=0.125 2023-11-19 21:18:35,362 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.76 vs. limit=15.0 2023-11-19 21:18:37,132 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 121400 2023-11-19 21:19:04,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=809406.6666666666, ans=0.125 2023-11-19 21:19:07,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=809473.3333333334, ans=0.0 2023-11-19 21:19:20,033 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 1200, loss[loss=0.09358, simple_loss=0.1109, pruned_loss=0.02997, audio_tagging_loss=0.008179, over 15318.00 frames. ], tot_loss[loss=0.08385, simple_loss=0.1031, pruned_loss=0.02198, audio_tagging_loss=0.01034, over 3033992.02 frames. ], batch size: 56, lr: 6.42e-03, grad_scale: 32.0 2023-11-19 21:19:23,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=809540.0, ans=0.0 2023-11-19 21:19:27,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=809540.0, ans=0.0 2023-11-19 21:19:31,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=809540.0, ans=0.0 2023-11-19 21:19:33,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=809606.6666666666, ans=0.95 2023-11-19 21:19:36,419 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.659e+01 8.249e+01 9.079e+01 9.946e+01 1.270e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-19 21:19:39,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=809606.6666666666, ans=0.125 2023-11-19 21:19:42,814 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 121450 2023-11-19 21:19:49,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=809673.3333333334, ans=0.125 2023-11-19 21:19:53,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=809673.3333333334, ans=0.125 2023-11-19 21:20:03,248 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2023-11-19 21:20:09,726 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.40 vs. limit=15.0 2023-11-19 21:20:25,699 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 1250, loss[loss=0.09234, simple_loss=0.1209, pruned_loss=0.02464, audio_tagging_loss=0.007236, over 14312.00 frames. ], tot_loss[loss=0.08399, simple_loss=0.1034, pruned_loss=0.02207, audio_tagging_loss=0.01021, over 3035159.87 frames. ], batch size: 54, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:20:38,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=809940.0, ans=0.125 2023-11-19 21:20:43,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=809940.0, ans=0.1 2023-11-19 21:20:46,397 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 121500 2023-11-19 21:20:54,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=810006.6666666666, ans=0.125 2023-11-19 21:21:07,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=810073.3333333334, ans=0.1 2023-11-19 21:21:16,032 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.99 vs. limit=15.0 2023-11-19 21:21:20,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=810140.0, ans=0.125 2023-11-19 21:21:24,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=810140.0, ans=0.035 2023-11-19 21:21:28,565 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 1300, loss[loss=0.07606, simple_loss=0.07885, pruned_loss=0.02253, audio_tagging_loss=0.0141, over 15350.00 frames. ], tot_loss[loss=0.08378, simple_loss=0.1033, pruned_loss=0.02195, audio_tagging_loss=0.01016, over 3047453.87 frames. ], batch size: 60, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:21:45,107 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.756e+01 8.695e+01 9.311e+01 1.032e+02 1.222e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-19 21:21:50,138 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 121550 2023-11-19 21:21:57,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=810340.0, ans=0.0 2023-11-19 21:22:01,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=810340.0, ans=0.125 2023-11-19 21:22:01,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=810340.0, ans=0.125 2023-11-19 21:22:04,183 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.57 vs. limit=15.0 2023-11-19 21:22:04,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=810340.0, ans=0.0 2023-11-19 21:22:23,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=810473.3333333334, ans=0.125 2023-11-19 21:22:32,531 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 1350, loss[loss=0.1115, simple_loss=0.1476, pruned_loss=0.02977, audio_tagging_loss=0.007898, over 16390.00 frames. ], tot_loss[loss=0.08396, simple_loss=0.1035, pruned_loss=0.02206, audio_tagging_loss=0.01015, over 3045209.46 frames. ], batch size: 58, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:22:34,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=810540.0, ans=0.0 2023-11-19 21:22:35,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=810540.0, ans=0.125 2023-11-19 21:22:35,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.01 vs. limit=15.0 2023-11-19 21:22:43,200 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2023-11-19 21:22:54,951 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 121600 2023-11-19 21:22:55,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=810606.6666666666, ans=0.0 2023-11-19 21:23:06,626 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:23:06,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=810673.3333333334, ans=0.2 2023-11-19 21:23:18,711 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:23:24,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=810806.6666666666, ans=0.125 2023-11-19 21:23:27,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=810806.6666666666, ans=0.125 2023-11-19 21:23:29,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=810806.6666666666, ans=0.2 2023-11-19 21:23:33,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=810806.6666666666, ans=0.2 2023-11-19 21:23:37,682 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 1400, loss[loss=0.07426, simple_loss=0.08518, pruned_loss=0.0215, audio_tagging_loss=0.01017, over 15078.00 frames. ], tot_loss[loss=0.08335, simple_loss=0.1026, pruned_loss=0.02181, audio_tagging_loss=0.01024, over 3043617.75 frames. ], batch size: 59, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:23:48,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=810940.0, ans=0.0 2023-11-19 21:23:53,612 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.902e+01 8.352e+01 8.972e+01 9.668e+01 1.251e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-19 21:23:57,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=810940.0, ans=0.0 2023-11-19 21:23:58,576 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 121650 2023-11-19 21:24:00,286 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.37 vs. limit=15.0 2023-11-19 21:24:32,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.57 vs. limit=15.0 2023-11-19 21:24:40,433 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 1450, loss[loss=0.1122, simple_loss=0.1326, pruned_loss=0.03704, audio_tagging_loss=0.008859, over 13656.00 frames. ], tot_loss[loss=0.08351, simple_loss=0.103, pruned_loss=0.02179, audio_tagging_loss=0.01021, over 3039926.71 frames. ], batch size: 53, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:24:43,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=811206.6666666666, ans=0.125 2023-11-19 21:24:45,958 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2023-11-19 21:24:48,116 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:24:56,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=811273.3333333334, ans=0.125 2023-11-19 21:24:58,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=811273.3333333334, ans=0.125 2023-11-19 21:25:01,972 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 121700 2023-11-19 21:25:03,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=811273.3333333334, ans=0.0 2023-11-19 21:25:33,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=811473.3333333334, ans=0.1 2023-11-19 21:25:44,109 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 1500, loss[loss=0.05434, simple_loss=0.06194, pruned_loss=0.009543, audio_tagging_loss=0.01382, over 14059.00 frames. ], tot_loss[loss=0.08316, simple_loss=0.1024, pruned_loss=0.02156, audio_tagging_loss=0.01041, over 3046510.68 frames. ], batch size: 55, lr: 6.41e-03, grad_scale: 16.0 2023-11-19 21:25:47,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=811540.0, ans=0.125 2023-11-19 21:25:49,211 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.22 vs. limit=15.0 2023-11-19 21:26:01,430 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.708e+01 8.227e+01 9.077e+01 1.029e+02 1.490e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-19 21:26:07,329 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 121750 2023-11-19 21:26:12,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=811673.3333333334, ans=0.2 2023-11-19 21:26:16,553 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2023-11-19 21:26:36,895 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2023-11-19 21:26:40,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=811806.6666666666, ans=0.0 2023-11-19 21:26:48,080 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 1550, loss[loss=0.09697, simple_loss=0.1189, pruned_loss=0.02687, audio_tagging_loss=0.01063, over 15507.00 frames. ], tot_loss[loss=0.08399, simple_loss=0.1033, pruned_loss=0.02185, audio_tagging_loss=0.01048, over 3045233.98 frames. ], batch size: 59, lr: 6.41e-03, grad_scale: 16.0 2023-11-19 21:26:56,917 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.60 vs. limit=22.5 2023-11-19 21:27:11,406 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 121800 2023-11-19 21:27:16,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=812006.6666666666, ans=0.125 2023-11-19 21:27:45,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=812140.0, ans=0.0 2023-11-19 21:27:54,387 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 1600, loss[loss=0.07898, simple_loss=0.09997, pruned_loss=0.01938, audio_tagging_loss=0.009622, over 15435.00 frames. ], tot_loss[loss=0.08386, simple_loss=0.1031, pruned_loss=0.02176, audio_tagging_loss=0.01053, over 3050252.61 frames. ], batch size: 57, lr: 6.41e-03, grad_scale: 32.0 2023-11-19 21:27:55,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=812206.6666666666, ans=0.125 2023-11-19 21:28:10,251 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.741e+01 8.357e+01 8.989e+01 9.853e+01 1.199e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 21:28:15,892 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 121850 2023-11-19 21:28:36,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=812406.6666666666, ans=10.0 2023-11-19 21:28:57,471 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 1650, loss[loss=0.07954, simple_loss=0.1042, pruned_loss=0.01798, audio_tagging_loss=0.009478, over 14615.00 frames. ], tot_loss[loss=0.08387, simple_loss=0.103, pruned_loss=0.02177, audio_tagging_loss=0.01062, over 3047287.98 frames. ], batch size: 55, lr: 6.41e-03, grad_scale: 32.0 2023-11-19 21:29:00,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=812540.0, ans=0.125 2023-11-19 21:29:02,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=812540.0, ans=0.02 2023-11-19 21:29:19,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=812606.6666666666, ans=0.125 2023-11-19 21:29:20,480 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 121900 2023-11-19 21:29:35,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=812740.0, ans=0.0 2023-11-19 21:30:01,908 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 1700, loss[loss=0.05988, simple_loss=0.06617, pruned_loss=0.01718, audio_tagging_loss=0.009613, over 13943.00 frames. ], tot_loss[loss=0.08459, simple_loss=0.1039, pruned_loss=0.022, audio_tagging_loss=0.01062, over 3047635.52 frames. ], batch size: 55, lr: 6.41e-03, grad_scale: 32.0 2023-11-19 21:30:19,796 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 8.451e+01 9.168e+01 1.014e+02 1.661e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-19 21:30:20,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=812940.0, ans=0.0 2023-11-19 21:30:20,354 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-11-19 21:30:22,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=812940.0, ans=0.0 2023-11-19 21:30:24,742 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 121950 2023-11-19 21:30:45,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=813073.3333333334, ans=0.125 2023-11-19 21:30:49,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=813073.3333333334, ans=0.1 2023-11-19 21:31:01,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=813140.0, ans=0.0 2023-11-19 21:31:02,603 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2023-11-19 21:31:07,572 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 1750, loss[loss=0.07607, simple_loss=0.09681, pruned_loss=0.01575, audio_tagging_loss=0.01192, over 15294.00 frames. ], tot_loss[loss=0.08406, simple_loss=0.1034, pruned_loss=0.02188, audio_tagging_loss=0.01047, over 3054742.21 frames. ], batch size: 57, lr: 6.41e-03, grad_scale: 32.0 2023-11-19 21:31:13,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=813206.6666666666, ans=0.2 2023-11-19 21:31:28,418 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 122000 2023-11-19 21:31:33,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=813340.0, ans=0.0 2023-11-19 21:31:56,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=813406.6666666666, ans=0.125 2023-11-19 21:32:09,184 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2023-11-19 21:32:11,594 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=15.0 2023-11-19 21:32:12,130 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 1800, loss[loss=0.07834, simple_loss=0.09454, pruned_loss=0.01877, audio_tagging_loss=0.0123, over 14889.00 frames. ], tot_loss[loss=0.08332, simple_loss=0.1028, pruned_loss=0.02168, audio_tagging_loss=0.01026, over 3048444.51 frames. ], batch size: 56, lr: 6.41e-03, grad_scale: 32.0 2023-11-19 21:32:15,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=813540.0, ans=0.1 2023-11-19 21:32:26,740 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=22.5 2023-11-19 21:32:28,676 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.860e+01 8.061e+01 8.839e+01 9.659e+01 3.662e+02, threshold=1.768e+02, percent-clipped=1.0 2023-11-19 21:32:34,306 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 122050 2023-11-19 21:32:36,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=813606.6666666666, ans=0.125 2023-11-19 21:32:37,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=813673.3333333334, ans=0.125 2023-11-19 21:32:48,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=813673.3333333334, ans=0.0 2023-11-19 21:32:53,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=813740.0, ans=0.0 2023-11-19 21:32:54,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=813740.0, ans=0.2 2023-11-19 21:33:04,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=813806.6666666666, ans=0.125 2023-11-19 21:33:16,792 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 1850, loss[loss=0.06855, simple_loss=0.07858, pruned_loss=0.01736, audio_tagging_loss=0.0119, over 15482.00 frames. ], tot_loss[loss=0.0829, simple_loss=0.1024, pruned_loss=0.02146, audio_tagging_loss=0.01024, over 3045743.25 frames. ], batch size: 59, lr: 6.40e-03, grad_scale: 16.0 2023-11-19 21:33:38,884 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 122100 2023-11-19 21:33:45,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=814006.6666666666, ans=0.125 2023-11-19 21:34:08,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=814140.0, ans=0.0 2023-11-19 21:34:10,752 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.82 vs. limit=5.0 2023-11-19 21:34:21,905 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 1900, loss[loss=0.08964, simple_loss=0.1159, pruned_loss=0.02223, audio_tagging_loss=0.009469, over 15763.00 frames. ], tot_loss[loss=0.0826, simple_loss=0.1017, pruned_loss=0.02146, audio_tagging_loss=0.01027, over 3047688.61 frames. ], batch size: 58, lr: 6.40e-03, grad_scale: 16.0 2023-11-19 21:34:39,213 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.70 vs. limit=22.5 2023-11-19 21:34:39,679 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.976e+01 8.118e+01 8.678e+01 9.738e+01 1.673e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-19 21:34:42,436 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:34:43,614 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 122150 2023-11-19 21:34:59,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=814406.6666666666, ans=0.1 2023-11-19 21:34:59,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=814406.6666666666, ans=0.1 2023-11-19 21:35:04,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=814406.6666666666, ans=0.125 2023-11-19 21:35:08,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=814406.6666666666, ans=0.2 2023-11-19 21:35:26,497 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 1950, loss[loss=0.0727, simple_loss=0.08656, pruned_loss=0.01936, audio_tagging_loss=0.01006, over 15808.00 frames. ], tot_loss[loss=0.08314, simple_loss=0.1026, pruned_loss=0.02164, audio_tagging_loss=0.01022, over 3046831.01 frames. ], batch size: 60, lr: 6.40e-03, grad_scale: 16.0 2023-11-19 21:35:48,062 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 122200 2023-11-19 21:35:49,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=814606.6666666666, ans=0.1 2023-11-19 21:36:19,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=814806.6666666666, ans=0.0 2023-11-19 21:36:27,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=814806.6666666666, ans=0.2 2023-11-19 21:36:31,165 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 2000, loss[loss=0.08651, simple_loss=0.1041, pruned_loss=0.02534, audio_tagging_loss=0.0091, over 14398.00 frames. ], tot_loss[loss=0.08422, simple_loss=0.104, pruned_loss=0.02206, audio_tagging_loss=0.01017, over 3046800.90 frames. ], batch size: 57, lr: 6.40e-03, grad_scale: 32.0 2023-11-19 21:36:36,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2023-11-19 21:36:49,638 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.135e+01 8.499e+01 9.542e+01 1.090e+02 1.717e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-19 21:36:53,413 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 122250 2023-11-19 21:37:02,416 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:37:15,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=815073.3333333334, ans=0.1 2023-11-19 21:37:18,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=815073.3333333334, ans=0.125 2023-11-19 21:37:29,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=815140.0, ans=0.125 2023-11-19 21:37:33,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=815140.0, ans=0.1 2023-11-19 21:37:36,813 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 2050, loss[loss=0.0826, simple_loss=0.1053, pruned_loss=0.02289, audio_tagging_loss=0.007048, over 15168.00 frames. ], tot_loss[loss=0.08477, simple_loss=0.1049, pruned_loss=0.0223, audio_tagging_loss=0.01003, over 3051700.27 frames. ], batch size: 57, lr: 6.40e-03, grad_scale: 32.0 2023-11-19 21:37:58,486 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 122300 2023-11-19 21:38:03,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=815340.0, ans=0.0 2023-11-19 21:38:19,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=815406.6666666666, ans=0.0 2023-11-19 21:38:23,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=815406.6666666666, ans=0.125 2023-11-19 21:38:38,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=815473.3333333334, ans=0.1 2023-11-19 21:38:38,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=815473.3333333334, ans=0.125 2023-11-19 21:38:39,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=815540.0, ans=0.0 2023-11-19 21:38:40,861 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 2100, loss[loss=0.1002, simple_loss=0.1175, pruned_loss=0.03182, audio_tagging_loss=0.009661, over 15092.00 frames. ], tot_loss[loss=0.08433, simple_loss=0.1046, pruned_loss=0.02206, audio_tagging_loss=0.009976, over 3056428.54 frames. ], batch size: 54, lr: 6.40e-03, grad_scale: 32.0 2023-11-19 21:38:56,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2023-11-19 21:38:58,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.52 vs. limit=15.0 2023-11-19 21:38:59,063 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.154e+01 8.161e+01 9.375e+01 1.016e+02 1.346e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-19 21:38:59,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=815606.6666666666, ans=0.035 2023-11-19 21:39:02,977 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 122350 2023-11-19 21:39:05,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=815673.3333333334, ans=0.125 2023-11-19 21:39:11,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=815673.3333333334, ans=0.125 2023-11-19 21:39:14,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=815673.3333333334, ans=0.07 2023-11-19 21:39:20,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=815740.0, ans=22.5 2023-11-19 21:39:30,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=815740.0, ans=0.125 2023-11-19 21:39:36,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=815806.6666666666, ans=0.1 2023-11-19 21:39:42,638 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.17 vs. limit=15.0 2023-11-19 21:39:45,687 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 2150, loss[loss=0.09242, simple_loss=0.1253, pruned_loss=0.02264, audio_tagging_loss=0.007113, over 15315.00 frames. ], tot_loss[loss=0.0849, simple_loss=0.1054, pruned_loss=0.02217, audio_tagging_loss=0.01003, over 3052006.64 frames. ], batch size: 56, lr: 6.40e-03, grad_scale: 32.0 2023-11-19 21:39:45,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=815873.3333333334, ans=0.2 2023-11-19 21:40:03,599 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.44 vs. limit=22.5 2023-11-19 21:40:08,132 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 122400 2023-11-19 21:40:24,489 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:40:51,642 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 2200, loss[loss=0.06366, simple_loss=0.07645, pruned_loss=0.01484, audio_tagging_loss=0.01059, over 15922.00 frames. ], tot_loss[loss=0.08481, simple_loss=0.1052, pruned_loss=0.02214, audio_tagging_loss=0.01009, over 3057319.12 frames. ], batch size: 61, lr: 6.40e-03, grad_scale: 32.0 2023-11-19 21:40:57,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=816206.6666666666, ans=0.125 2023-11-19 21:41:08,859 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.683e+01 8.220e+01 9.086e+01 1.022e+02 1.678e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 21:41:10,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=816273.3333333334, ans=0.125 2023-11-19 21:41:12,636 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 122450 2023-11-19 21:41:16,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=816340.0, ans=0.1 2023-11-19 21:41:18,883 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.66 vs. limit=15.0 2023-11-19 21:41:19,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=816340.0, ans=10.0 2023-11-19 21:41:24,898 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:41:47,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=816473.3333333334, ans=0.125 2023-11-19 21:41:50,252 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-11-19 21:41:55,715 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 2250, loss[loss=0.06389, simple_loss=0.07987, pruned_loss=0.0127, audio_tagging_loss=0.01125, over 13976.00 frames. ], tot_loss[loss=0.08488, simple_loss=0.1049, pruned_loss=0.02222, audio_tagging_loss=0.0102, over 3055250.25 frames. ], batch size: 54, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:42:02,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=816540.0, ans=0.125 2023-11-19 21:42:16,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=816606.6666666666, ans=0.05 2023-11-19 21:42:17,209 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=12.0 2023-11-19 21:42:17,970 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 122500 2023-11-19 21:42:55,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=816806.6666666666, ans=0.125 2023-11-19 21:43:00,643 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 2300, loss[loss=0.08154, simple_loss=0.09783, pruned_loss=0.02347, audio_tagging_loss=0.009156, over 14942.00 frames. ], tot_loss[loss=0.08469, simple_loss=0.1043, pruned_loss=0.02221, audio_tagging_loss=0.01032, over 3051405.93 frames. ], batch size: 56, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:43:08,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=816873.3333333334, ans=0.125 2023-11-19 21:43:17,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=816940.0, ans=0.125 2023-11-19 21:43:19,986 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.000e+01 8.330e+01 8.975e+01 9.637e+01 1.370e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-19 21:43:23,785 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 122550 2023-11-19 21:43:25,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=816940.0, ans=0.125 2023-11-19 21:43:58,415 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:44:04,766 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.29 vs. limit=15.0 2023-11-19 21:44:07,015 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 2350, loss[loss=0.09849, simple_loss=0.1205, pruned_loss=0.03058, audio_tagging_loss=0.007641, over 14451.00 frames. ], tot_loss[loss=0.08545, simple_loss=0.1055, pruned_loss=0.02236, audio_tagging_loss=0.01031, over 3053827.83 frames. ], batch size: 56, lr: 6.39e-03, grad_scale: 16.0 2023-11-19 21:44:28,148 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 122600 2023-11-19 21:44:31,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=817340.0, ans=0.125 2023-11-19 21:44:38,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=817340.0, ans=0.125 2023-11-19 21:44:39,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=817340.0, ans=0.125 2023-11-19 21:45:00,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.95 vs. limit=15.0 2023-11-19 21:45:11,206 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 2400, loss[loss=0.07489, simple_loss=0.08787, pruned_loss=0.01823, audio_tagging_loss=0.01272, over 16575.00 frames. ], tot_loss[loss=0.08529, simple_loss=0.1052, pruned_loss=0.02228, audio_tagging_loss=0.01043, over 3052932.13 frames. ], batch size: 63, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:45:30,243 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.655e+01 8.060e+01 8.978e+01 9.886e+01 1.686e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 21:45:32,848 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 122650 2023-11-19 21:45:42,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=817673.3333333334, ans=0.2 2023-11-19 21:45:44,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=817673.3333333334, ans=0.2 2023-11-19 21:45:48,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=817673.3333333334, ans=0.0 2023-11-19 21:45:58,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=817740.0, ans=0.2 2023-11-19 21:46:05,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=817806.6666666666, ans=0.125 2023-11-19 21:46:15,438 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 2450, loss[loss=0.0877, simple_loss=0.1137, pruned_loss=0.02153, audio_tagging_loss=0.009315, over 16950.00 frames. ], tot_loss[loss=0.08545, simple_loss=0.1053, pruned_loss=0.02225, audio_tagging_loss=0.01055, over 3056313.70 frames. ], batch size: 61, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:46:15,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=817873.3333333334, ans=0.0 2023-11-19 21:46:36,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=817940.0, ans=0.125 2023-11-19 21:46:38,550 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 122700 2023-11-19 21:46:57,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=818073.3333333334, ans=0.0 2023-11-19 21:47:10,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.51 vs. limit=22.5 2023-11-19 21:47:17,109 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=15.0 2023-11-19 21:47:21,093 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 2500, loss[loss=0.06682, simple_loss=0.08424, pruned_loss=0.01365, audio_tagging_loss=0.01105, over 15995.00 frames. ], tot_loss[loss=0.08494, simple_loss=0.1046, pruned_loss=0.02204, audio_tagging_loss=0.01062, over 3053657.09 frames. ], batch size: 61, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:47:40,052 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.459e+01 8.244e+01 8.954e+01 9.755e+01 1.221e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-19 21:47:42,578 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 122750 2023-11-19 21:47:58,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2023-11-19 21:48:03,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.85 vs. limit=15.0 2023-11-19 21:48:21,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=818473.3333333334, ans=0.125 2023-11-19 21:48:25,695 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 2550, loss[loss=0.08219, simple_loss=0.1045, pruned_loss=0.02039, audio_tagging_loss=0.009541, over 14964.00 frames. ], tot_loss[loss=0.08481, simple_loss=0.1042, pruned_loss=0.02216, audio_tagging_loss=0.01054, over 3053876.32 frames. ], batch size: 56, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:48:47,234 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 122800 2023-11-19 21:49:08,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=818740.0, ans=0.125 2023-11-19 21:49:27,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=818806.6666666666, ans=0.5 2023-11-19 21:49:30,321 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 2600, loss[loss=0.0937, simple_loss=0.1157, pruned_loss=0.02735, audio_tagging_loss=0.008509, over 13557.00 frames. ], tot_loss[loss=0.0841, simple_loss=0.1036, pruned_loss=0.02182, audio_tagging_loss=0.01049, over 3052769.83 frames. ], batch size: 53, lr: 6.39e-03, grad_scale: 16.0 2023-11-19 21:49:52,164 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.517e+01 9.235e+01 1.002e+02 1.405e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 21:49:53,577 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 122850 2023-11-19 21:50:04,298 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=12.0 2023-11-19 21:50:26,021 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2023-11-19 21:50:26,036 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=15.0 2023-11-19 21:50:35,310 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 2650, loss[loss=0.08592, simple_loss=0.1129, pruned_loss=0.02109, audio_tagging_loss=0.008369, over 15377.00 frames. ], tot_loss[loss=0.08385, simple_loss=0.1033, pruned_loss=0.02174, audio_tagging_loss=0.01044, over 3050748.16 frames. ], batch size: 56, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:50:38,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=819206.6666666666, ans=0.0 2023-11-19 21:50:43,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.50 vs. limit=15.0 2023-11-19 21:50:58,156 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 122900 2023-11-19 21:51:04,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=819340.0, ans=0.125 2023-11-19 21:51:23,705 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:51:28,578 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=15.0 2023-11-19 21:51:32,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=819473.3333333334, ans=0.0 2023-11-19 21:51:36,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=819473.3333333334, ans=0.125 2023-11-19 21:51:41,507 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 2700, loss[loss=0.09705, simple_loss=0.1225, pruned_loss=0.02818, audio_tagging_loss=0.007633, over 15796.00 frames. ], tot_loss[loss=0.08414, simple_loss=0.1038, pruned_loss=0.0219, audio_tagging_loss=0.01033, over 3049836.51 frames. ], batch size: 57, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:51:41,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=819540.0, ans=0.1 2023-11-19 21:51:49,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=819540.0, ans=0.125 2023-11-19 21:51:57,332 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.11 vs. limit=22.5 2023-11-19 21:52:01,449 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 8.510e+01 9.186e+01 1.041e+02 2.301e+02, threshold=1.837e+02, percent-clipped=1.0 2023-11-19 21:52:03,533 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 122950 2023-11-19 21:52:18,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=819673.3333333334, ans=0.09899494936611666 2023-11-19 21:52:27,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=12.0 2023-11-19 21:52:34,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=819806.6666666666, ans=0.0 2023-11-19 21:52:35,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.78 vs. limit=22.5 2023-11-19 21:52:39,641 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.99 vs. limit=12.0 2023-11-19 21:52:46,249 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 2750, loss[loss=0.07887, simple_loss=0.08798, pruned_loss=0.02218, audio_tagging_loss=0.0127, over 14730.00 frames. ], tot_loss[loss=0.08386, simple_loss=0.1035, pruned_loss=0.02191, audio_tagging_loss=0.01021, over 3042904.60 frames. ], batch size: 57, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:52:46,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=819873.3333333334, ans=0.1 2023-11-19 21:53:00,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=819940.0, ans=0.125 2023-11-19 21:53:08,473 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 123000 2023-11-19 21:53:20,303 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.33 vs. limit=22.5 2023-11-19 21:53:22,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=820006.6666666666, ans=0.09899494936611666 2023-11-19 21:53:23,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=820006.6666666666, ans=0.2 2023-11-19 21:53:32,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=820073.3333333334, ans=0.1 2023-11-19 21:53:33,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=820073.3333333334, ans=0.125 2023-11-19 21:53:34,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=820073.3333333334, ans=0.0 2023-11-19 21:53:40,866 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:53:46,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=820140.0, ans=0.125 2023-11-19 21:53:51,418 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 2800, loss[loss=0.09891, simple_loss=0.123, pruned_loss=0.02767, audio_tagging_loss=0.009737, over 14474.00 frames. ], tot_loss[loss=0.08376, simple_loss=0.1035, pruned_loss=0.02184, audio_tagging_loss=0.01016, over 3041192.70 frames. ], batch size: 54, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:53:52,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=820206.6666666666, ans=0.125 2023-11-19 21:54:02,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=820206.6666666666, ans=0.2 2023-11-19 21:54:05,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=820273.3333333334, ans=0.05 2023-11-19 21:54:14,044 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.993e+01 8.139e+01 8.815e+01 9.780e+01 1.679e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-19 21:54:14,191 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 123050 2023-11-19 21:54:18,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=820340.0, ans=0.125 2023-11-19 21:54:24,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=820340.0, ans=0.0 2023-11-19 21:54:47,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=820473.3333333334, ans=0.2 2023-11-19 21:54:52,601 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2023-11-19 21:54:53,739 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.76 vs. limit=22.5 2023-11-19 21:54:56,823 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 2850, loss[loss=0.09315, simple_loss=0.1194, pruned_loss=0.02427, audio_tagging_loss=0.009204, over 15319.00 frames. ], tot_loss[loss=0.08413, simple_loss=0.1039, pruned_loss=0.02201, audio_tagging_loss=0.01015, over 3041944.01 frames. ], batch size: 58, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:54:59,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.19 vs. limit=22.5 2023-11-19 21:55:01,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=820540.0, ans=0.0 2023-11-19 21:55:08,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=820606.6666666666, ans=0.125 2023-11-19 21:55:13,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=820606.6666666666, ans=6.0 2023-11-19 21:55:18,660 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 123100 2023-11-19 21:55:43,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=820740.0, ans=0.2 2023-11-19 21:56:01,659 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2023-11-19 21:56:02,092 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 2900, loss[loss=0.07011, simple_loss=0.08937, pruned_loss=0.0162, audio_tagging_loss=0.009222, over 14634.00 frames. ], tot_loss[loss=0.08361, simple_loss=0.1032, pruned_loss=0.02182, audio_tagging_loss=0.01018, over 3037804.51 frames. ], batch size: 57, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:56:13,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=820873.3333333334, ans=0.05 2023-11-19 21:56:23,812 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.958e+01 8.184e+01 8.854e+01 9.488e+01 1.292e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-19 21:56:23,952 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 123150 2023-11-19 21:56:26,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=820940.0, ans=0.125 2023-11-19 21:56:28,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=821006.6666666666, ans=0.1 2023-11-19 21:56:56,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=821140.0, ans=0.1 2023-11-19 21:57:06,590 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 2950, loss[loss=0.08008, simple_loss=0.09067, pruned_loss=0.02146, audio_tagging_loss=0.01328, over 13977.00 frames. ], tot_loss[loss=0.08439, simple_loss=0.1039, pruned_loss=0.02215, audio_tagging_loss=0.01027, over 3040610.91 frames. ], batch size: 53, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:57:08,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=821206.6666666666, ans=0.125 2023-11-19 21:57:21,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=821273.3333333334, ans=0.125 2023-11-19 21:57:24,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=821273.3333333334, ans=0.125 2023-11-19 21:57:28,845 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 123200 2023-11-19 21:57:30,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=821273.3333333334, ans=0.125 2023-11-19 21:57:41,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=821340.0, ans=0.125 2023-11-19 21:57:49,503 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.98 vs. limit=10.0 2023-11-19 21:57:55,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=821406.6666666666, ans=0.09899494936611666 2023-11-19 21:57:58,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=821473.3333333334, ans=0.2 2023-11-19 21:58:02,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=821473.3333333334, ans=0.125 2023-11-19 21:58:12,083 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 3000, loss[loss=0.09011, simple_loss=0.1176, pruned_loss=0.02335, audio_tagging_loss=0.007949, over 15013.00 frames. ], tot_loss[loss=0.08481, simple_loss=0.1046, pruned_loss=0.02223, audio_tagging_loss=0.0103, over 3041534.23 frames. ], batch size: 57, lr: 6.37e-03, grad_scale: 16.0 2023-11-19 21:58:12,084 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-19 21:58:52,220 INFO [train_asr.py:1294] (3/4) Epoch 11, validation: loss=0.06441, simple_loss=0.05497, pruned_loss=0.006219, audio_tagging_loss=0.03071, over 4681554.00 frames. 2023-11-19 21:58:52,221 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-19 21:58:52,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=821540.0, ans=0.0 2023-11-19 21:58:58,216 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.76 vs. limit=22.5 2023-11-19 21:58:58,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=821540.0, ans=0.125 2023-11-19 21:59:00,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=821540.0, ans=0.1 2023-11-19 21:59:01,992 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.56 vs. limit=15.0 2023-11-19 21:59:06,629 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.84 vs. limit=15.0 2023-11-19 21:59:14,429 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.751e+01 8.445e+01 9.049e+01 1.018e+02 1.456e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 21:59:14,576 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 123250 2023-11-19 21:59:18,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=821673.3333333334, ans=0.125 2023-11-19 21:59:18,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=821673.3333333334, ans=0.125 2023-11-19 21:59:55,781 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 3050, loss[loss=0.07724, simple_loss=0.09521, pruned_loss=0.0202, audio_tagging_loss=0.009426, over 14692.00 frames. ], tot_loss[loss=0.08462, simple_loss=0.1043, pruned_loss=0.02212, audio_tagging_loss=0.01032, over 3037556.38 frames. ], batch size: 54, lr: 6.37e-03, grad_scale: 16.0 2023-11-19 22:00:00,965 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=15.0 2023-11-19 22:00:08,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=821940.0, ans=0.0 2023-11-19 22:00:18,120 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 123300 2023-11-19 22:00:22,361 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.51 vs. limit=15.0 2023-11-19 22:00:23,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=822006.6666666666, ans=0.1 2023-11-19 22:00:26,539 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=12.0 2023-11-19 22:00:33,957 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:00:39,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=822073.3333333334, ans=0.125 2023-11-19 22:01:01,024 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 3100, loss[loss=0.08047, simple_loss=0.09194, pruned_loss=0.02041, audio_tagging_loss=0.01409, over 15342.00 frames. ], tot_loss[loss=0.08483, simple_loss=0.1046, pruned_loss=0.02219, audio_tagging_loss=0.01034, over 3038103.68 frames. ], batch size: 59, lr: 6.37e-03, grad_scale: 16.0 2023-11-19 22:01:07,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=822206.6666666666, ans=0.1 2023-11-19 22:01:22,956 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.148e+01 8.871e+01 9.460e+01 1.235e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-19 22:01:23,097 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 123350 2023-11-19 22:01:24,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=822273.3333333334, ans=0.0 2023-11-19 22:01:39,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=822406.6666666666, ans=0.1 2023-11-19 22:01:49,277 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.80 vs. limit=15.0 2023-11-19 22:01:52,124 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=6.0 2023-11-19 22:02:03,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=822540.0, ans=0.1 2023-11-19 22:02:05,591 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 3150, loss[loss=0.0696, simple_loss=0.0775, pruned_loss=0.01799, audio_tagging_loss=0.01286, over 14782.00 frames. ], tot_loss[loss=0.08528, simple_loss=0.105, pruned_loss=0.02243, audio_tagging_loss=0.01036, over 3042711.83 frames. ], batch size: 57, lr: 6.37e-03, grad_scale: 16.0 2023-11-19 22:02:11,796 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.67 vs. limit=15.0 2023-11-19 22:02:17,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=822606.6666666666, ans=0.2 2023-11-19 22:02:27,745 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 123400 2023-11-19 22:02:53,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=822740.0, ans=0.1 2023-11-19 22:03:10,316 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 3200, loss[loss=0.08993, simple_loss=0.1098, pruned_loss=0.02262, audio_tagging_loss=0.01238, over 16066.00 frames. ], tot_loss[loss=0.085, simple_loss=0.1044, pruned_loss=0.0223, audio_tagging_loss=0.01047, over 3043681.47 frames. ], batch size: 57, lr: 6.37e-03, grad_scale: 32.0 2023-11-19 22:03:21,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=822873.3333333334, ans=0.0 2023-11-19 22:03:22,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=822940.0, ans=0.125 2023-11-19 22:03:27,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=822940.0, ans=0.125 2023-11-19 22:03:32,360 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.734e+01 8.258e+01 8.832e+01 9.801e+01 1.591e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-19 22:03:32,510 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 123450 2023-11-19 22:03:32,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=822940.0, ans=0.2 2023-11-19 22:03:40,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=823006.6666666666, ans=0.125 2023-11-19 22:03:42,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=823006.6666666666, ans=0.125 2023-11-19 22:03:55,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=823073.3333333334, ans=0.2 2023-11-19 22:04:15,948 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 3250, loss[loss=0.0885, simple_loss=0.1081, pruned_loss=0.02608, audio_tagging_loss=0.008402, over 14837.00 frames. ], tot_loss[loss=0.08426, simple_loss=0.1032, pruned_loss=0.02207, audio_tagging_loss=0.01058, over 3045101.56 frames. ], batch size: 55, lr: 6.37e-03, grad_scale: 32.0 2023-11-19 22:04:18,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=823206.6666666666, ans=0.05 2023-11-19 22:04:31,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=823273.3333333334, ans=0.2 2023-11-19 22:04:36,901 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 123500 2023-11-19 22:05:03,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=823406.6666666666, ans=0.0 2023-11-19 22:05:06,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=823473.3333333334, ans=0.125 2023-11-19 22:05:15,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=823473.3333333334, ans=0.0 2023-11-19 22:05:18,909 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 3300, loss[loss=0.08299, simple_loss=0.1116, pruned_loss=0.01736, audio_tagging_loss=0.009829, over 14787.00 frames. ], tot_loss[loss=0.08404, simple_loss=0.103, pruned_loss=0.02194, audio_tagging_loss=0.01059, over 3043059.83 frames. ], batch size: 56, lr: 6.37e-03, grad_scale: 32.0 2023-11-19 22:05:24,377 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.19 vs. limit=15.0 2023-11-19 22:05:40,312 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.60 vs. limit=12.0 2023-11-19 22:05:40,812 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.419e+01 8.972e+01 9.610e+01 1.838e+02, threshold=1.794e+02, percent-clipped=1.0 2023-11-19 22:05:40,962 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 123550 2023-11-19 22:05:45,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=823673.3333333334, ans=10.0 2023-11-19 22:06:00,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=823740.0, ans=0.0 2023-11-19 22:06:01,413 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2023-11-19 22:06:04,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=823740.0, ans=0.125 2023-11-19 22:06:08,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.07 vs. limit=15.0 2023-11-19 22:06:20,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=823806.6666666666, ans=0.0 2023-11-19 22:06:23,948 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 3350, loss[loss=0.09658, simple_loss=0.1178, pruned_loss=0.02792, audio_tagging_loss=0.00976, over 14800.00 frames. ], tot_loss[loss=0.08444, simple_loss=0.1034, pruned_loss=0.02222, audio_tagging_loss=0.01052, over 3046688.06 frames. ], batch size: 57, lr: 6.37e-03, grad_scale: 32.0 2023-11-19 22:06:24,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=823873.3333333334, ans=0.1 2023-11-19 22:06:45,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=823940.0, ans=0.125 2023-11-19 22:06:46,362 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 123600 2023-11-19 22:07:18,162 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=12.0 2023-11-19 22:07:29,868 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 3400, loss[loss=0.06226, simple_loss=0.07374, pruned_loss=0.0125, audio_tagging_loss=0.01288, over 13969.00 frames. ], tot_loss[loss=0.08422, simple_loss=0.1034, pruned_loss=0.02212, audio_tagging_loss=0.01038, over 3042668.35 frames. ], batch size: 54, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:07:41,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=824273.3333333334, ans=0.2 2023-11-19 22:07:48,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=824273.3333333334, ans=0.0 2023-11-19 22:07:50,886 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.806e+01 8.550e+01 9.235e+01 1.051e+02 1.197e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 22:07:51,038 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 123650 2023-11-19 22:08:07,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=824406.6666666666, ans=0.125 2023-11-19 22:08:11,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=824406.6666666666, ans=0.125 2023-11-19 22:08:30,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=824473.3333333334, ans=0.1 2023-11-19 22:08:33,940 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 3450, loss[loss=0.1026, simple_loss=0.131, pruned_loss=0.02815, audio_tagging_loss=0.008926, over 15879.00 frames. ], tot_loss[loss=0.08441, simple_loss=0.1037, pruned_loss=0.02233, audio_tagging_loss=0.01024, over 3041952.17 frames. ], batch size: 57, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:08:36,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=824540.0, ans=0.125 2023-11-19 22:08:38,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=824540.0, ans=0.125 2023-11-19 22:08:56,258 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 123700 2023-11-19 22:08:59,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=824673.3333333334, ans=0.1 2023-11-19 22:09:02,076 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:09:07,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=824673.3333333334, ans=0.125 2023-11-19 22:09:07,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=824673.3333333334, ans=0.2 2023-11-19 22:09:14,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=824740.0, ans=0.0 2023-11-19 22:09:15,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=824740.0, ans=15.0 2023-11-19 22:09:38,706 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 3500, loss[loss=0.09515, simple_loss=0.1179, pruned_loss=0.02588, audio_tagging_loss=0.01034, over 15662.00 frames. ], tot_loss[loss=0.08505, simple_loss=0.1048, pruned_loss=0.02255, audio_tagging_loss=0.01009, over 3038920.53 frames. ], batch size: 58, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:09:52,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=824940.0, ans=0.0 2023-11-19 22:09:53,055 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.02 vs. limit=22.5 2023-11-19 22:09:54,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=824940.0, ans=0.125 2023-11-19 22:10:01,200 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.383e+01 8.617e+01 9.360e+01 1.039e+02 1.365e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-19 22:10:01,342 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 123750 2023-11-19 22:10:02,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=824940.0, ans=0.035 2023-11-19 22:10:12,147 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:10:16,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=825073.3333333334, ans=0.0 2023-11-19 22:10:17,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=825073.3333333334, ans=0.1 2023-11-19 22:10:26,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=825073.3333333334, ans=0.1 2023-11-19 22:10:27,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=825073.3333333334, ans=0.125 2023-11-19 22:10:43,985 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 3550, loss[loss=0.06962, simple_loss=0.08682, pruned_loss=0.01264, audio_tagging_loss=0.01357, over 14596.00 frames. ], tot_loss[loss=0.08427, simple_loss=0.104, pruned_loss=0.02215, audio_tagging_loss=0.0101, over 3041685.38 frames. ], batch size: 56, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:10:47,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=825206.6666666666, ans=0.125 2023-11-19 22:10:52,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=825206.6666666666, ans=0.0 2023-11-19 22:11:04,774 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 123800 2023-11-19 22:11:13,559 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2023-11-19 22:11:17,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=825340.0, ans=0.125 2023-11-19 22:11:17,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=825340.0, ans=0.0 2023-11-19 22:11:20,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=825406.6666666666, ans=0.125 2023-11-19 22:11:42,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=825473.3333333334, ans=0.1 2023-11-19 22:11:47,383 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 3600, loss[loss=0.0677, simple_loss=0.08017, pruned_loss=0.01855, audio_tagging_loss=0.009066, over 14456.00 frames. ], tot_loss[loss=0.08341, simple_loss=0.1027, pruned_loss=0.02194, audio_tagging_loss=0.01013, over 3033081.66 frames. ], batch size: 56, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:12:08,998 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.358e+01 8.243e+01 9.327e+01 1.037e+02 1.432e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-19 22:12:09,143 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 123850 2023-11-19 22:12:11,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=825673.3333333334, ans=0.0 2023-11-19 22:12:21,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=825673.3333333334, ans=0.025 2023-11-19 22:12:26,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=825740.0, ans=0.1 2023-11-19 22:12:34,663 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.01 vs. limit=6.0 2023-11-19 22:12:38,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=825806.6666666666, ans=0.125 2023-11-19 22:12:52,071 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 3650, loss[loss=0.07504, simple_loss=0.09217, pruned_loss=0.01907, audio_tagging_loss=0.009884, over 15185.00 frames. ], tot_loss[loss=0.08409, simple_loss=0.1038, pruned_loss=0.02216, audio_tagging_loss=0.01004, over 3041907.59 frames. ], batch size: 59, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:12:54,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=825873.3333333334, ans=0.125 2023-11-19 22:13:15,355 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 123900 2023-11-19 22:13:19,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=826006.6666666666, ans=0.125 2023-11-19 22:13:21,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=826006.6666666666, ans=0.2 2023-11-19 22:13:44,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=826140.0, ans=0.0 2023-11-19 22:13:57,516 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 3700, loss[loss=0.08235, simple_loss=0.1019, pruned_loss=0.02299, audio_tagging_loss=0.008415, over 15155.00 frames. ], tot_loss[loss=0.08419, simple_loss=0.104, pruned_loss=0.02219, audio_tagging_loss=0.01001, over 3048058.42 frames. ], batch size: 58, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:13:59,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=826206.6666666666, ans=0.125 2023-11-19 22:14:04,890 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2023-11-19 22:14:09,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=826273.3333333334, ans=0.125 2023-11-19 22:14:13,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=826273.3333333334, ans=0.2 2023-11-19 22:14:15,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=826273.3333333334, ans=0.0 2023-11-19 22:14:16,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=826273.3333333334, ans=0.125 2023-11-19 22:14:17,203 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2023-11-19 22:14:18,778 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 8.345e+01 9.061e+01 9.917e+01 1.388e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-19 22:14:18,933 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 123950 2023-11-19 22:14:22,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=826340.0, ans=0.0 2023-11-19 22:14:41,249 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.76 vs. limit=15.0 2023-11-19 22:14:51,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=826473.3333333334, ans=0.125 2023-11-19 22:14:58,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=826473.3333333334, ans=0.125 2023-11-19 22:15:01,866 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 3750, loss[loss=0.08356, simple_loss=0.1013, pruned_loss=0.02557, audio_tagging_loss=0.007323, over 13945.00 frames. ], tot_loss[loss=0.08355, simple_loss=0.1031, pruned_loss=0.02203, audio_tagging_loss=0.01, over 3049194.94 frames. ], batch size: 56, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:15:23,414 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 124000 2023-11-19 22:15:50,601 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:15:57,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=826806.6666666666, ans=0.025 2023-11-19 22:16:00,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=826806.6666666666, ans=0.125 2023-11-19 22:16:09,638 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 3800, loss[loss=0.09692, simple_loss=0.1293, pruned_loss=0.02453, audio_tagging_loss=0.00772, over 15540.00 frames. ], tot_loss[loss=0.08451, simple_loss=0.1042, pruned_loss=0.02229, audio_tagging_loss=0.01012, over 3052817.01 frames. ], batch size: 55, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:16:32,417 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.419e+01 9.148e+01 9.794e+01 1.250e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 22:16:32,568 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 124050 2023-11-19 22:16:32,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=826940.0, ans=0.0 2023-11-19 22:16:42,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=827006.6666666666, ans=0.125 2023-11-19 22:16:51,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=15.0 2023-11-19 22:16:51,453 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.82 vs. limit=22.5 2023-11-19 22:17:11,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=827140.0, ans=0.0 2023-11-19 22:17:13,887 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 3850, loss[loss=0.07169, simple_loss=0.07627, pruned_loss=0.0191, audio_tagging_loss=0.01446, over 15479.00 frames. ], tot_loss[loss=0.0841, simple_loss=0.1034, pruned_loss=0.02212, audio_tagging_loss=0.01029, over 3041623.64 frames. ], batch size: 60, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:17:24,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=827206.6666666666, ans=0.0 2023-11-19 22:17:27,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=827273.3333333334, ans=0.125 2023-11-19 22:17:35,533 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 124100 2023-11-19 22:17:39,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=827340.0, ans=0.2 2023-11-19 22:17:58,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=827406.6666666666, ans=0.0 2023-11-19 22:18:00,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=827406.6666666666, ans=0.125 2023-11-19 22:18:09,345 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:18:18,129 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 3900, loss[loss=0.06917, simple_loss=0.07654, pruned_loss=0.0177, audio_tagging_loss=0.0132, over 15387.00 frames. ], tot_loss[loss=0.0842, simple_loss=0.1035, pruned_loss=0.02208, audio_tagging_loss=0.01035, over 3044229.26 frames. ], batch size: 58, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:18:33,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=827606.6666666666, ans=0.125 2023-11-19 22:18:37,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=827606.6666666666, ans=0.0 2023-11-19 22:18:39,677 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.879e+01 8.442e+01 9.156e+01 1.005e+02 1.586e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 22:18:39,826 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 124150 2023-11-19 22:19:01,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=827740.0, ans=0.2 2023-11-19 22:19:19,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=827806.6666666666, ans=0.09899494936611666 2023-11-19 22:19:22,047 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 3950, loss[loss=0.08546, simple_loss=0.1129, pruned_loss=0.01666, audio_tagging_loss=0.01235, over 15531.00 frames. ], tot_loss[loss=0.08404, simple_loss=0.1031, pruned_loss=0.02205, audio_tagging_loss=0.01041, over 3037620.30 frames. ], batch size: 55, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:19:26,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=827873.3333333334, ans=0.0 2023-11-19 22:19:32,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=827873.3333333334, ans=0.125 2023-11-19 22:19:34,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=827940.0, ans=0.125 2023-11-19 22:19:42,354 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2023-11-19 22:19:44,113 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 124200 2023-11-19 22:19:47,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=828006.6666666666, ans=0.125 2023-11-19 22:20:18,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=828140.0, ans=0.1 2023-11-19 22:20:22,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=828140.0, ans=0.125 2023-11-19 22:20:22,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=828140.0, ans=0.1 2023-11-19 22:20:25,284 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.50 vs. limit=15.0 2023-11-19 22:20:27,564 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 4000, loss[loss=0.06945, simple_loss=0.08019, pruned_loss=0.02137, audio_tagging_loss=0.007989, over 14809.00 frames. ], tot_loss[loss=0.08434, simple_loss=0.1034, pruned_loss=0.02213, audio_tagging_loss=0.0105, over 3031640.80 frames. ], batch size: 57, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:20:43,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=828273.3333333334, ans=0.125 2023-11-19 22:20:49,746 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.189e+01 8.890e+01 9.727e+01 1.231e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-19 22:20:49,900 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 124250 2023-11-19 22:20:51,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=828273.3333333334, ans=0.125 2023-11-19 22:21:01,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=828340.0, ans=0.1 2023-11-19 22:21:31,905 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 4050, loss[loss=0.08115, simple_loss=0.103, pruned_loss=0.01894, audio_tagging_loss=0.01069, over 14751.00 frames. ], tot_loss[loss=0.08366, simple_loss=0.1024, pruned_loss=0.02189, audio_tagging_loss=0.01054, over 3038691.69 frames. ], batch size: 55, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:21:33,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=828540.0, ans=0.125 2023-11-19 22:21:36,220 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:21:45,466 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2023-11-19 22:21:48,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=828606.6666666666, ans=0.05 2023-11-19 22:21:53,952 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 124300 2023-11-19 22:22:01,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=828673.3333333334, ans=0.125 2023-11-19 22:22:05,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=828673.3333333334, ans=0.125 2023-11-19 22:22:16,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=828740.0, ans=0.125 2023-11-19 22:22:28,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=828806.6666666666, ans=0.0 2023-11-19 22:22:36,334 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 4100, loss[loss=0.1197, simple_loss=0.156, pruned_loss=0.03405, audio_tagging_loss=0.007629, over 14045.00 frames. ], tot_loss[loss=0.08435, simple_loss=0.1034, pruned_loss=0.02202, audio_tagging_loss=0.01062, over 3035498.00 frames. ], batch size: 52, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:22:57,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=828940.0, ans=0.125 2023-11-19 22:22:58,276 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.257e+01 8.196e+01 8.855e+01 9.661e+01 1.383e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-19 22:22:58,496 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 124350 2023-11-19 22:23:10,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=829006.6666666666, ans=0.125 2023-11-19 22:23:18,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=829073.3333333334, ans=0.0 2023-11-19 22:23:18,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=829073.3333333334, ans=0.125 2023-11-19 22:23:38,975 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=22.5 2023-11-19 22:23:40,911 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 4150, loss[loss=0.08884, simple_loss=0.118, pruned_loss=0.02199, audio_tagging_loss=0.007868, over 15506.00 frames. ], tot_loss[loss=0.0837, simple_loss=0.1028, pruned_loss=0.02182, audio_tagging_loss=0.01046, over 3033176.93 frames. ], batch size: 56, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:24:02,795 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 124400 2023-11-19 22:24:18,428 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.78 vs. limit=15.0 2023-11-19 22:24:20,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=829406.6666666666, ans=22.5 2023-11-19 22:24:22,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=829406.6666666666, ans=0.2 2023-11-19 22:24:27,482 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:24:34,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=829473.3333333334, ans=0.0 2023-11-19 22:24:36,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=829473.3333333334, ans=0.05 2023-11-19 22:24:45,660 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 4200, loss[loss=0.0651, simple_loss=0.0759, pruned_loss=0.01481, audio_tagging_loss=0.01234, over 14962.00 frames. ], tot_loss[loss=0.08351, simple_loss=0.1029, pruned_loss=0.0217, audio_tagging_loss=0.01034, over 3038369.09 frames. ], batch size: 56, lr: 6.34e-03, grad_scale: 32.0 2023-11-19 22:24:50,983 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:24:57,438 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=15.0 2023-11-19 22:25:07,642 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.135e+01 8.898e+01 9.525e+01 1.896e+02, threshold=1.780e+02, percent-clipped=1.0 2023-11-19 22:25:07,791 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 124450 2023-11-19 22:25:10,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=829673.3333333334, ans=0.05 2023-11-19 22:25:14,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=829673.3333333334, ans=0.125 2023-11-19 22:25:19,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=829673.3333333334, ans=0.125 2023-11-19 22:25:20,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=829673.3333333334, ans=0.125 2023-11-19 22:25:50,210 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 4250, loss[loss=0.07711, simple_loss=0.09341, pruned_loss=0.02145, audio_tagging_loss=0.008956, over 15126.00 frames. ], tot_loss[loss=0.08344, simple_loss=0.1031, pruned_loss=0.02174, audio_tagging_loss=0.01016, over 3043078.89 frames. ], batch size: 55, lr: 6.34e-03, grad_scale: 16.0 2023-11-19 22:26:07,665 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.36 vs. limit=15.0 2023-11-19 22:26:12,428 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 124500 2023-11-19 22:26:34,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=830073.3333333334, ans=0.0 2023-11-19 22:26:35,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=830073.3333333334, ans=0.0 2023-11-19 22:26:37,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=830073.3333333334, ans=0.0 2023-11-19 22:26:55,227 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 4300, loss[loss=0.07837, simple_loss=0.09389, pruned_loss=0.02108, audio_tagging_loss=0.01035, over 15934.00 frames. ], tot_loss[loss=0.08474, simple_loss=0.1048, pruned_loss=0.02224, audio_tagging_loss=0.01007, over 3046212.73 frames. ], batch size: 61, lr: 6.34e-03, grad_scale: 16.0 2023-11-19 22:27:00,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=830206.6666666666, ans=0.0 2023-11-19 22:27:07,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=830273.3333333334, ans=0.0 2023-11-19 22:27:17,374 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 124550 2023-11-19 22:27:18,443 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.753e+01 8.093e+01 8.901e+01 9.811e+01 2.323e+02, threshold=1.780e+02, percent-clipped=1.0 2023-11-19 22:27:28,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=830340.0, ans=0.125 2023-11-19 22:27:59,301 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 4350, loss[loss=0.07196, simple_loss=0.09215, pruned_loss=0.01557, audio_tagging_loss=0.01032, over 14277.00 frames. ], tot_loss[loss=0.08434, simple_loss=0.1044, pruned_loss=0.02207, audio_tagging_loss=0.01005, over 3043724.86 frames. ], batch size: 56, lr: 6.34e-03, grad_scale: 16.0 2023-11-19 22:28:20,764 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 124600 2023-11-19 22:28:24,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=830673.3333333334, ans=0.125 2023-11-19 22:29:03,769 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 4400, loss[loss=0.1126, simple_loss=0.1538, pruned_loss=0.03021, audio_tagging_loss=0.005495, over 15483.00 frames. ], tot_loss[loss=0.08514, simple_loss=0.1053, pruned_loss=0.02239, audio_tagging_loss=0.01009, over 3043834.52 frames. ], batch size: 55, lr: 6.34e-03, grad_scale: 32.0 2023-11-19 22:29:05,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=830873.3333333334, ans=0.125 2023-11-19 22:29:26,141 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 124650 2023-11-19 22:29:27,897 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.389e+01 8.577e+01 9.158e+01 9.839e+01 1.465e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 22:30:09,324 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 4450, loss[loss=0.08778, simple_loss=0.106, pruned_loss=0.02387, audio_tagging_loss=0.01089, over 14895.00 frames. ], tot_loss[loss=0.08497, simple_loss=0.1051, pruned_loss=0.02233, audio_tagging_loss=0.01009, over 3046481.56 frames. ], batch size: 56, lr: 6.34e-03, grad_scale: 32.0 2023-11-19 22:30:09,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=831206.6666666666, ans=0.2 2023-11-19 22:30:31,334 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 124700 2023-11-19 22:30:45,490 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.36 vs. limit=22.5 2023-11-19 22:31:08,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=831473.3333333334, ans=0.125 2023-11-19 22:31:13,687 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 4500, loss[loss=0.09303, simple_loss=0.1088, pruned_loss=0.02601, audio_tagging_loss=0.0126, over 14916.00 frames. ], tot_loss[loss=0.08472, simple_loss=0.1051, pruned_loss=0.02218, audio_tagging_loss=0.009999, over 3048530.53 frames. ], batch size: 59, lr: 6.34e-03, grad_scale: 32.0 2023-11-19 22:31:27,299 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.23 vs. limit=15.0 2023-11-19 22:31:35,379 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 124750 2023-11-19 22:31:36,464 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.327e+01 8.964e+01 9.884e+01 1.189e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 22:31:58,975 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.60 vs. limit=22.5 2023-11-19 22:32:18,451 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 4550, loss[loss=0.09018, simple_loss=0.1173, pruned_loss=0.01904, audio_tagging_loss=0.01248, over 16468.00 frames. ], tot_loss[loss=0.08455, simple_loss=0.1049, pruned_loss=0.02206, audio_tagging_loss=0.01003, over 3053631.19 frames. ], batch size: 61, lr: 6.34e-03, grad_scale: 32.0 2023-11-19 22:32:22,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2023-11-19 22:32:30,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=831940.0, ans=0.125 2023-11-19 22:32:40,905 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 124800 2023-11-19 22:32:49,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=832006.6666666666, ans=0.2 2023-11-19 22:32:55,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=832006.6666666666, ans=0.07 2023-11-19 22:33:08,669 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:33:12,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=832140.0, ans=0.125 2023-11-19 22:33:24,164 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 4600, loss[loss=0.07092, simple_loss=0.09018, pruned_loss=0.01618, audio_tagging_loss=0.009648, over 14959.00 frames. ], tot_loss[loss=0.08395, simple_loss=0.1039, pruned_loss=0.0219, audio_tagging_loss=0.01008, over 3043286.58 frames. ], batch size: 54, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:33:32,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=832206.6666666666, ans=0.125 2023-11-19 22:33:45,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=832273.3333333334, ans=0.2 2023-11-19 22:33:46,716 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 124850 2023-11-19 22:33:47,807 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.870e+01 8.306e+01 8.909e+01 9.778e+01 1.815e+02, threshold=1.782e+02, percent-clipped=2.0 2023-11-19 22:34:14,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=832473.3333333334, ans=0.2 2023-11-19 22:34:29,634 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 4650, loss[loss=0.08053, simple_loss=0.1004, pruned_loss=0.02004, audio_tagging_loss=0.01029, over 15031.00 frames. ], tot_loss[loss=0.08374, simple_loss=0.1038, pruned_loss=0.02167, audio_tagging_loss=0.01018, over 3046418.46 frames. ], batch size: 57, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:34:31,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=832540.0, ans=0.0 2023-11-19 22:34:38,635 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=12.0 2023-11-19 22:34:51,403 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 124900 2023-11-19 22:34:58,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=832673.3333333334, ans=0.0 2023-11-19 22:34:59,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=832673.3333333334, ans=0.125 2023-11-19 22:35:34,362 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 4700, loss[loss=0.08781, simple_loss=0.1094, pruned_loss=0.02323, audio_tagging_loss=0.009896, over 15076.00 frames. ], tot_loss[loss=0.0832, simple_loss=0.1029, pruned_loss=0.02141, audio_tagging_loss=0.01034, over 3043367.44 frames. ], batch size: 56, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:35:55,640 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 124950 2023-11-19 22:35:56,665 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.758e+01 8.340e+01 9.099e+01 9.695e+01 1.346e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-19 22:36:05,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=833006.6666666666, ans=0.0 2023-11-19 22:36:26,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=833140.0, ans=0.125 2023-11-19 22:36:28,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=833140.0, ans=0.125 2023-11-19 22:36:29,006 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.64 vs. limit=15.0 2023-11-19 22:36:36,612 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.95 vs. limit=15.0 2023-11-19 22:36:38,660 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 4750, loss[loss=0.07514, simple_loss=0.09899, pruned_loss=0.01664, audio_tagging_loss=0.009002, over 15315.00 frames. ], tot_loss[loss=0.08307, simple_loss=0.103, pruned_loss=0.02132, audio_tagging_loss=0.01027, over 3043197.32 frames. ], batch size: 57, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:36:40,555 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2023-11-19 22:36:41,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=833206.6666666666, ans=0.0 2023-11-19 22:36:59,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=833273.3333333334, ans=0.125 2023-11-19 22:37:00,307 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 125000 2023-11-19 22:37:09,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=833340.0, ans=0.2 2023-11-19 22:37:32,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=833473.3333333334, ans=0.2 2023-11-19 22:37:39,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=833473.3333333334, ans=0.015 2023-11-19 22:37:42,735 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 4800, loss[loss=0.09922, simple_loss=0.1338, pruned_loss=0.02437, audio_tagging_loss=0.007965, over 14676.00 frames. ], tot_loss[loss=0.08376, simple_loss=0.1036, pruned_loss=0.02157, audio_tagging_loss=0.01041, over 3040870.05 frames. ], batch size: 56, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:37:45,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=833540.0, ans=0.0 2023-11-19 22:37:58,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=833606.6666666666, ans=0.125 2023-11-19 22:38:04,963 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 125050 2023-11-19 22:38:07,378 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.959e+01 8.107e+01 9.125e+01 1.005e+02 1.234e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 22:38:08,246 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2023-11-19 22:38:11,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=833673.3333333334, ans=0.125 2023-11-19 22:38:26,585 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:38:37,718 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=12.0 2023-11-19 22:38:41,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=833806.6666666666, ans=0.0 2023-11-19 22:38:44,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=833806.6666666666, ans=0.0 2023-11-19 22:38:47,588 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 4850, loss[loss=0.0855, simple_loss=0.1059, pruned_loss=0.02268, audio_tagging_loss=0.009882, over 14964.00 frames. ], tot_loss[loss=0.08353, simple_loss=0.1031, pruned_loss=0.02146, audio_tagging_loss=0.01051, over 3042022.92 frames. ], batch size: 56, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:38:56,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=833873.3333333334, ans=0.125 2023-11-19 22:39:00,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=833940.0, ans=0.1 2023-11-19 22:39:09,127 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 125100 2023-11-19 22:39:18,172 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.24 vs. limit=22.5 2023-11-19 22:39:51,855 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 4900, loss[loss=0.06922, simple_loss=0.08319, pruned_loss=0.01624, audio_tagging_loss=0.01139, over 14105.00 frames. ], tot_loss[loss=0.08366, simple_loss=0.1033, pruned_loss=0.02147, audio_tagging_loss=0.01054, over 3039393.17 frames. ], batch size: 55, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:40:13,336 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 125150 2023-11-19 22:40:16,274 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.135e+01 8.432e+01 8.847e+01 9.512e+01 1.221e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-19 22:40:45,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=834473.3333333334, ans=0.125 2023-11-19 22:40:55,054 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 4950, loss[loss=0.09441, simple_loss=0.1166, pruned_loss=0.0271, audio_tagging_loss=0.009007, over 16174.00 frames. ], tot_loss[loss=0.08287, simple_loss=0.1025, pruned_loss=0.02116, audio_tagging_loss=0.01046, over 3035661.57 frames. ], batch size: 58, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:41:10,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=834606.6666666666, ans=0.125 2023-11-19 22:41:17,915 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 125200 2023-11-19 22:41:28,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=834673.3333333334, ans=0.2 2023-11-19 22:41:46,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=834806.6666666666, ans=0.0 2023-11-19 22:42:00,394 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 5000, loss[loss=0.09181, simple_loss=0.1038, pruned_loss=0.02614, audio_tagging_loss=0.01378, over 14762.00 frames. ], tot_loss[loss=0.08246, simple_loss=0.1021, pruned_loss=0.02108, audio_tagging_loss=0.01034, over 3034144.43 frames. ], batch size: 54, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:42:08,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=834873.3333333334, ans=0.0 2023-11-19 22:42:17,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=834940.0, ans=0.125 2023-11-19 22:42:22,128 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 125250 2023-11-19 22:42:24,393 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.897e+01 8.190e+01 8.915e+01 9.653e+01 1.690e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-19 22:42:24,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=835006.6666666666, ans=0.05 2023-11-19 22:42:50,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=835073.3333333334, ans=0.1 2023-11-19 22:43:04,425 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 5050, loss[loss=0.1087, simple_loss=0.1292, pruned_loss=0.03422, audio_tagging_loss=0.009899, over 15217.00 frames. ], tot_loss[loss=0.08368, simple_loss=0.1037, pruned_loss=0.0217, audio_tagging_loss=0.01012, over 3036562.97 frames. ], batch size: 58, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:43:04,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=835206.6666666666, ans=0.125 2023-11-19 22:43:18,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=835273.3333333334, ans=0.1 2023-11-19 22:43:26,326 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 125300 2023-11-19 22:43:41,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=835340.0, ans=0.0 2023-11-19 22:44:07,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=835540.0, ans=0.1 2023-11-19 22:44:08,444 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 5100, loss[loss=0.1277, simple_loss=0.1554, pruned_loss=0.03956, audio_tagging_loss=0.0104, over 16597.00 frames. ], tot_loss[loss=0.08352, simple_loss=0.1035, pruned_loss=0.02173, audio_tagging_loss=0.01002, over 3039144.46 frames. ], batch size: 59, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:44:11,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=835540.0, ans=0.1 2023-11-19 22:44:14,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=835540.0, ans=0.1 2023-11-19 22:44:14,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=835540.0, ans=10.0 2023-11-19 22:44:17,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=835540.0, ans=0.125 2023-11-19 22:44:31,245 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 125350 2023-11-19 22:44:33,598 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.822e+01 8.119e+01 8.827e+01 9.463e+01 1.199e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-19 22:44:58,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=835740.0, ans=0.0 2023-11-19 22:44:59,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=835806.6666666666, ans=0.0 2023-11-19 22:45:01,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=835806.6666666666, ans=0.125 2023-11-19 22:45:14,094 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 5150, loss[loss=0.1012, simple_loss=0.1235, pruned_loss=0.02979, audio_tagging_loss=0.009639, over 16199.00 frames. ], tot_loss[loss=0.08359, simple_loss=0.1036, pruned_loss=0.02172, audio_tagging_loss=0.01006, over 3041953.66 frames. ], batch size: 61, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:45:30,523 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.93 vs. limit=15.0 2023-11-19 22:45:36,400 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 125400 2023-11-19 22:45:48,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=836006.6666666666, ans=0.2 2023-11-19 22:45:49,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=836006.6666666666, ans=0.1 2023-11-19 22:45:59,797 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.473e-01 2023-11-19 22:46:02,555 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.89 vs. limit=6.0 2023-11-19 22:46:07,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=836140.0, ans=0.0 2023-11-19 22:46:19,178 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 5200, loss[loss=0.06939, simple_loss=0.08699, pruned_loss=0.01264, audio_tagging_loss=0.01325, over 15196.00 frames. ], tot_loss[loss=0.08327, simple_loss=0.1033, pruned_loss=0.02152, audio_tagging_loss=0.0101, over 3044025.21 frames. ], batch size: 56, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:46:33,857 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.23 vs. limit=10.0 2023-11-19 22:46:40,479 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 125450 2023-11-19 22:46:43,469 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.292e+01 9.057e+01 9.966e+01 1.254e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 22:46:47,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=836340.0, ans=0.0 2023-11-19 22:47:00,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=836406.6666666666, ans=10.0 2023-11-19 22:47:00,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=836406.6666666666, ans=0.025 2023-11-19 22:47:08,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=836406.6666666666, ans=0.125 2023-11-19 22:47:11,222 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2023-11-19 22:47:19,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=836473.3333333334, ans=0.07 2023-11-19 22:47:21,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=836473.3333333334, ans=0.0 2023-11-19 22:47:23,273 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 5250, loss[loss=0.08563, simple_loss=0.1134, pruned_loss=0.02024, audio_tagging_loss=0.008677, over 14932.00 frames. ], tot_loss[loss=0.08324, simple_loss=0.1032, pruned_loss=0.02157, audio_tagging_loss=0.01005, over 3043607.51 frames. ], batch size: 56, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:47:38,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=836606.6666666666, ans=0.0 2023-11-19 22:47:45,077 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2023-11-19 22:47:45,661 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 125500 2023-11-19 22:47:59,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=836673.3333333334, ans=0.0 2023-11-19 22:48:28,096 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 5300, loss[loss=0.08125, simple_loss=0.09766, pruned_loss=0.02142, audio_tagging_loss=0.011, over 16031.00 frames. ], tot_loss[loss=0.08385, simple_loss=0.1039, pruned_loss=0.02189, audio_tagging_loss=0.01002, over 3049904.06 frames. ], batch size: 60, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:48:50,383 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 125550 2023-11-19 22:48:53,345 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.677e+01 8.460e+01 9.151e+01 1.090e+02 1.553e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 22:48:56,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2023-11-19 22:49:14,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=837073.3333333334, ans=0.125 2023-11-19 22:49:16,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=837073.3333333334, ans=0.0 2023-11-19 22:49:33,677 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 5350, loss[loss=0.08635, simple_loss=0.1035, pruned_loss=0.02171, audio_tagging_loss=0.01291, over 15128.00 frames. ], tot_loss[loss=0.08365, simple_loss=0.1035, pruned_loss=0.02175, audio_tagging_loss=0.01015, over 3047465.62 frames. ], batch size: 58, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:49:33,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=837206.6666666666, ans=0.125 2023-11-19 22:49:51,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=837273.3333333334, ans=0.2 2023-11-19 22:49:54,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=837273.3333333334, ans=0.125 2023-11-19 22:49:54,925 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 125600 2023-11-19 22:50:01,187 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.38 vs. limit=10.0 2023-11-19 22:50:03,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=837340.0, ans=0.125 2023-11-19 22:50:28,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=837473.3333333334, ans=0.1 2023-11-19 22:50:30,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=837473.3333333334, ans=0.0 2023-11-19 22:50:38,002 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 5400, loss[loss=0.08274, simple_loss=0.0952, pruned_loss=0.02557, audio_tagging_loss=0.009573, over 14711.00 frames. ], tot_loss[loss=0.084, simple_loss=0.1041, pruned_loss=0.02183, audio_tagging_loss=0.01014, over 3047709.29 frames. ], batch size: 56, lr: 6.31e-03, grad_scale: 16.0 2023-11-19 22:50:39,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.11 vs. limit=22.5 2023-11-19 22:50:45,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=837540.0, ans=0.125 2023-11-19 22:50:53,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=837606.6666666666, ans=0.0 2023-11-19 22:50:58,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=837606.6666666666, ans=0.0 2023-11-19 22:51:00,259 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 125650 2023-11-19 22:51:01,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=837606.6666666666, ans=0.05 2023-11-19 22:51:03,913 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.803e+01 8.135e+01 8.658e+01 9.508e+01 1.289e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-19 22:51:17,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=837740.0, ans=0.0 2023-11-19 22:51:42,663 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 5450, loss[loss=0.08808, simple_loss=0.09655, pruned_loss=0.02701, audio_tagging_loss=0.0128, over 16209.00 frames. ], tot_loss[loss=0.0845, simple_loss=0.1046, pruned_loss=0.02208, audio_tagging_loss=0.0101, over 3044483.87 frames. ], batch size: 63, lr: 6.31e-03, grad_scale: 16.0 2023-11-19 22:52:04,546 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 125700 2023-11-19 22:52:04,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=837940.0, ans=0.1 2023-11-19 22:52:18,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=838006.6666666666, ans=0.125 2023-11-19 22:52:34,328 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.48 vs. limit=12.0 2023-11-19 22:52:36,558 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.43 vs. limit=15.0 2023-11-19 22:52:38,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=838140.0, ans=0.2 2023-11-19 22:52:39,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=838140.0, ans=0.125 2023-11-19 22:52:40,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=22.5 2023-11-19 22:52:47,683 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 5500, loss[loss=0.08248, simple_loss=0.09985, pruned_loss=0.02218, audio_tagging_loss=0.01037, over 15343.00 frames. ], tot_loss[loss=0.08472, simple_loss=0.1049, pruned_loss=0.02214, audio_tagging_loss=0.01011, over 3050240.23 frames. ], batch size: 56, lr: 6.31e-03, grad_scale: 16.0 2023-11-19 22:52:54,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=838206.6666666666, ans=0.125 2023-11-19 22:53:00,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=838273.3333333334, ans=0.0 2023-11-19 22:53:03,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=838273.3333333334, ans=0.125 2023-11-19 22:53:09,221 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 125750 2023-11-19 22:53:12,836 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.496e+01 8.990e+01 9.633e+01 1.229e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 22:53:19,055 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.34 vs. limit=15.0 2023-11-19 22:53:36,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=838406.6666666666, ans=0.125 2023-11-19 22:53:42,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.22 vs. limit=22.5 2023-11-19 22:53:52,461 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 5550, loss[loss=0.08195, simple_loss=0.1003, pruned_loss=0.02324, audio_tagging_loss=0.008549, over 14977.00 frames. ], tot_loss[loss=0.08395, simple_loss=0.1037, pruned_loss=0.02182, audio_tagging_loss=0.01029, over 3050101.75 frames. ], batch size: 56, lr: 6.31e-03, grad_scale: 16.0 2023-11-19 22:53:56,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=838540.0, ans=0.0 2023-11-19 22:54:14,219 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 125800 2023-11-19 22:54:19,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=838673.3333333334, ans=0.0 2023-11-19 22:54:30,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=838740.0, ans=0.125 2023-11-19 22:54:41,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=838740.0, ans=0.1 2023-11-19 22:54:41,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=838740.0, ans=0.0 2023-11-19 22:54:47,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=838806.6666666666, ans=0.0 2023-11-19 22:54:57,826 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 5600, loss[loss=0.09339, simple_loss=0.1241, pruned_loss=0.01929, audio_tagging_loss=0.01206, over 16546.00 frames. ], tot_loss[loss=0.08412, simple_loss=0.104, pruned_loss=0.02175, audio_tagging_loss=0.01035, over 3056195.34 frames. ], batch size: 61, lr: 6.31e-03, grad_scale: 32.0 2023-11-19 22:55:19,542 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 125850 2023-11-19 22:55:23,384 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.434e+01 8.075e+01 8.698e+01 9.694e+01 1.274e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-19 22:55:25,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=839006.6666666666, ans=0.125 2023-11-19 22:55:28,350 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.05 vs. limit=15.0 2023-11-19 22:55:38,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=839073.3333333334, ans=0.5 2023-11-19 22:55:40,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=839073.3333333334, ans=0.125 2023-11-19 22:55:46,065 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:55:51,568 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=20.01 vs. limit=15.0 2023-11-19 22:55:52,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=839140.0, ans=0.0 2023-11-19 22:56:02,144 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2023-11-19 22:56:02,503 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 5650, loss[loss=0.09039, simple_loss=0.1027, pruned_loss=0.02611, audio_tagging_loss=0.01293, over 14813.00 frames. ], tot_loss[loss=0.08438, simple_loss=0.1042, pruned_loss=0.02182, audio_tagging_loss=0.01044, over 3060520.93 frames. ], batch size: 57, lr: 6.31e-03, grad_scale: 32.0 2023-11-19 22:56:24,933 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 125900 2023-11-19 22:56:30,525 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.46 vs. limit=22.5 2023-11-19 22:56:31,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=839340.0, ans=0.0 2023-11-19 22:56:32,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=839340.0, ans=0.1 2023-11-19 22:56:48,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=839406.6666666666, ans=0.125 2023-11-19 22:57:06,845 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 5700, loss[loss=0.09421, simple_loss=0.1111, pruned_loss=0.03024, audio_tagging_loss=0.008433, over 15572.00 frames. ], tot_loss[loss=0.08419, simple_loss=0.1038, pruned_loss=0.02177, audio_tagging_loss=0.01051, over 3057067.95 frames. ], batch size: 59, lr: 6.31e-03, grad_scale: 32.0 2023-11-19 22:57:07,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=839540.0, ans=0.125 2023-11-19 22:57:26,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=839606.6666666666, ans=0.04949747468305833 2023-11-19 22:57:29,135 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 125950 2023-11-19 22:57:34,477 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.315e+01 7.815e+01 8.868e+01 9.889e+01 1.263e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-19 22:57:49,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=839740.0, ans=0.125 2023-11-19 22:58:11,323 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 5750, loss[loss=0.09937, simple_loss=0.1233, pruned_loss=0.02835, audio_tagging_loss=0.009392, over 17318.00 frames. ], tot_loss[loss=0.0837, simple_loss=0.1031, pruned_loss=0.02167, audio_tagging_loss=0.01046, over 3050842.18 frames. ], batch size: 62, lr: 6.31e-03, grad_scale: 16.0 2023-11-19 22:58:23,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=839940.0, ans=0.125 2023-11-19 22:58:33,774 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 126000 2023-11-19 22:59:02,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=840140.0, ans=0.125 2023-11-19 22:59:16,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=840206.6666666666, ans=0.125 2023-11-19 22:59:17,502 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 5800, loss[loss=0.08424, simple_loss=0.1089, pruned_loss=0.0212, audio_tagging_loss=0.008569, over 14462.00 frames. ], tot_loss[loss=0.08466, simple_loss=0.1044, pruned_loss=0.02219, audio_tagging_loss=0.01029, over 3054495.52 frames. ], batch size: 54, lr: 6.30e-03, grad_scale: 16.0 2023-11-19 22:59:39,042 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 126050 2023-11-19 22:59:43,328 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.62 vs. limit=15.0 2023-11-19 22:59:43,825 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.582e+01 8.258e+01 8.850e+01 9.872e+01 1.564e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-19 22:59:46,560 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.33 vs. limit=15.0 2023-11-19 22:59:50,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.12 vs. limit=15.0 2023-11-19 22:59:52,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=840340.0, ans=0.125 2023-11-19 22:59:52,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=840340.0, ans=0.0 2023-11-19 23:00:22,529 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 5850, loss[loss=0.08397, simple_loss=0.0973, pruned_loss=0.02321, audio_tagging_loss=0.0121, over 14796.00 frames. ], tot_loss[loss=0.0842, simple_loss=0.104, pruned_loss=0.02195, audio_tagging_loss=0.01023, over 3044924.73 frames. ], batch size: 58, lr: 6.30e-03, grad_scale: 16.0 2023-11-19 23:00:44,895 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 126100 2023-11-19 23:00:47,957 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2023-11-19 23:00:48,043 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.51 vs. limit=22.5 2023-11-19 23:01:14,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=840806.6666666666, ans=0.0 2023-11-19 23:01:25,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=840873.3333333334, ans=0.09899494936611666 2023-11-19 23:01:27,275 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 5900, loss[loss=0.0813, simple_loss=0.1051, pruned_loss=0.02195, audio_tagging_loss=0.00682, over 14906.00 frames. ], tot_loss[loss=0.08472, simple_loss=0.105, pruned_loss=0.02208, audio_tagging_loss=0.01016, over 3051012.59 frames. ], batch size: 57, lr: 6.30e-03, grad_scale: 16.0 2023-11-19 23:01:31,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=840873.3333333334, ans=0.0 2023-11-19 23:01:49,500 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 126150 2023-11-19 23:01:50,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=840940.0, ans=0.125 2023-11-19 23:01:54,161 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.488e+01 8.354e+01 9.139e+01 9.974e+01 1.416e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 23:02:24,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=841140.0, ans=0.125 2023-11-19 23:02:32,589 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 5950, loss[loss=0.08971, simple_loss=0.1126, pruned_loss=0.02667, audio_tagging_loss=0.006746, over 15094.00 frames. ], tot_loss[loss=0.08496, simple_loss=0.1053, pruned_loss=0.02213, audio_tagging_loss=0.01019, over 3051991.63 frames. ], batch size: 55, lr: 6.30e-03, grad_scale: 16.0 2023-11-19 23:02:36,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=841206.6666666666, ans=0.125 2023-11-19 23:02:50,762 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.68 vs. limit=15.0 2023-11-19 23:02:53,888 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 126200 2023-11-19 23:03:12,185 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.15 vs. limit=12.0 2023-11-19 23:03:13,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=841406.6666666666, ans=0.125 2023-11-19 23:03:14,598 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=12.0 2023-11-19 23:03:19,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=841406.6666666666, ans=0.125 2023-11-19 23:03:36,317 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 6000, loss[loss=0.1082, simple_loss=0.123, pruned_loss=0.03812, audio_tagging_loss=0.008573, over 15532.00 frames. ], tot_loss[loss=0.08512, simple_loss=0.1056, pruned_loss=0.02225, audio_tagging_loss=0.01006, over 3054459.82 frames. ], batch size: 55, lr: 6.30e-03, grad_scale: 32.0 2023-11-19 23:03:36,318 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-19 23:04:02,350 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1066, 4.9456, 3.6366, 3.8643], device='cuda:3') 2023-11-19 23:04:18,345 INFO [train_asr.py:1294] (3/4) Epoch 11, validation: loss=0.06364, simple_loss=0.05477, pruned_loss=0.006179, audio_tagging_loss=0.03008, over 4681554.00 frames. 2023-11-19 23:04:18,346 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-19 23:04:40,329 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 126250 2023-11-19 23:04:45,171 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.220e+01 8.140e+01 8.896e+01 9.801e+01 1.425e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-19 23:04:45,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=841673.3333333334, ans=0.0 2023-11-19 23:04:53,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=841673.3333333334, ans=0.0 2023-11-19 23:05:07,230 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 23:05:18,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=841806.6666666666, ans=0.125 2023-11-19 23:05:23,645 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 6050, loss[loss=0.1007, simple_loss=0.1265, pruned_loss=0.02917, audio_tagging_loss=0.008306, over 15307.00 frames. ], tot_loss[loss=0.08495, simple_loss=0.1052, pruned_loss=0.02225, audio_tagging_loss=0.01008, over 3055918.21 frames. ], batch size: 55, lr: 6.30e-03, grad_scale: 32.0 2023-11-19 23:05:45,488 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 126300 2023-11-19 23:05:46,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=841940.0, ans=0.09899494936611666 2023-11-19 23:05:54,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=842006.6666666666, ans=0.04949747468305833 2023-11-19 23:06:12,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=842073.3333333334, ans=0.125 2023-11-19 23:06:18,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=842140.0, ans=0.125 2023-11-19 23:06:27,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=842206.6666666666, ans=0.125 2023-11-19 23:06:28,646 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 6100, loss[loss=0.08261, simple_loss=0.1023, pruned_loss=0.02093, audio_tagging_loss=0.01054, over 13665.00 frames. ], tot_loss[loss=0.08483, simple_loss=0.105, pruned_loss=0.02214, audio_tagging_loss=0.0102, over 3053009.32 frames. ], batch size: 52, lr: 6.30e-03, grad_scale: 32.0 2023-11-19 23:06:50,060 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 126350 2023-11-19 23:06:55,417 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.655e+01 9.472e+01 1.043e+02 1.487e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-19 23:07:05,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=842340.0, ans=0.2 2023-11-19 23:07:08,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=842406.6666666666, ans=0.2 2023-11-19 23:07:11,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=842406.6666666666, ans=0.0 2023-11-19 23:07:27,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=842473.3333333334, ans=0.125 2023-11-19 23:07:32,562 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 6150, loss[loss=0.0702, simple_loss=0.08194, pruned_loss=0.0163, audio_tagging_loss=0.01293, over 14526.00 frames. ], tot_loss[loss=0.08384, simple_loss=0.1035, pruned_loss=0.02177, audio_tagging_loss=0.01034, over 3053029.07 frames. ], batch size: 56, lr: 6.30e-03, grad_scale: 32.0 2023-11-19 23:07:37,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=842540.0, ans=0.0 2023-11-19 23:07:39,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=842540.0, ans=0.1 2023-11-19 23:07:50,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=842606.6666666666, ans=0.1 2023-11-19 23:07:55,471 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 126400 2023-11-19 23:08:12,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=842740.0, ans=0.0 2023-11-19 23:08:26,174 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.36 vs. limit=22.5 2023-11-19 23:08:39,211 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 6200, loss[loss=0.08748, simple_loss=0.1079, pruned_loss=0.02184, audio_tagging_loss=0.01169, over 14723.00 frames. ], tot_loss[loss=0.08309, simple_loss=0.1025, pruned_loss=0.02148, audio_tagging_loss=0.01035, over 3048331.66 frames. ], batch size: 53, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:08:56,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=842940.0, ans=0.09899494936611666 2023-11-19 23:08:57,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=842940.0, ans=0.125 2023-11-19 23:09:00,689 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 126450 2023-11-19 23:09:05,393 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.350e+01 8.922e+01 9.859e+01 1.209e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-19 23:09:06,098 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.95 vs. limit=10.0 2023-11-19 23:09:42,308 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 6250, loss[loss=0.08478, simple_loss=0.09871, pruned_loss=0.02081, audio_tagging_loss=0.01462, over 15407.00 frames. ], tot_loss[loss=0.08284, simple_loss=0.1021, pruned_loss=0.02133, audio_tagging_loss=0.01046, over 3046843.56 frames. ], batch size: 59, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:09:45,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=843206.6666666666, ans=0.125 2023-11-19 23:09:47,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=843206.6666666666, ans=0.015 2023-11-19 23:09:57,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=843273.3333333334, ans=0.125 2023-11-19 23:10:04,672 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 126500 2023-11-19 23:10:20,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=843406.6666666666, ans=0.2 2023-11-19 23:10:27,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=843406.6666666666, ans=0.125 2023-11-19 23:10:47,080 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 6300, loss[loss=0.1033, simple_loss=0.1267, pruned_loss=0.03177, audio_tagging_loss=0.008179, over 15074.00 frames. ], tot_loss[loss=0.08319, simple_loss=0.1025, pruned_loss=0.0215, audio_tagging_loss=0.01046, over 3044554.28 frames. ], batch size: 55, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:10:48,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2023-11-19 23:10:49,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=843540.0, ans=0.015 2023-11-19 23:10:54,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=843540.0, ans=0.0 2023-11-19 23:11:01,884 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.48 vs. limit=15.0 2023-11-19 23:11:09,959 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 126550 2023-11-19 23:11:14,970 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.441e+01 8.231e+01 8.988e+01 9.749e+01 1.273e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 23:11:27,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=843740.0, ans=0.0 2023-11-19 23:11:47,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=843806.6666666666, ans=0.125 2023-11-19 23:11:50,367 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.65 vs. limit=6.0 2023-11-19 23:11:52,586 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 6350, loss[loss=0.09685, simple_loss=0.1184, pruned_loss=0.02663, audio_tagging_loss=0.01103, over 15473.00 frames. ], tot_loss[loss=0.08325, simple_loss=0.1023, pruned_loss=0.0215, audio_tagging_loss=0.01058, over 3045985.20 frames. ], batch size: 58, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:12:04,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=843940.0, ans=0.125 2023-11-19 23:12:11,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=843940.0, ans=0.025 2023-11-19 23:12:14,904 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 126600 2023-11-19 23:12:19,402 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2023-11-19 23:12:26,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=844006.6666666666, ans=0.0 2023-11-19 23:12:42,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=844073.3333333334, ans=0.125 2023-11-19 23:12:57,576 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 6400, loss[loss=0.09874, simple_loss=0.1188, pruned_loss=0.02928, audio_tagging_loss=0.01005, over 14902.00 frames. ], tot_loss[loss=0.08379, simple_loss=0.103, pruned_loss=0.02172, audio_tagging_loss=0.01058, over 3044140.40 frames. ], batch size: 57, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:13:13,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=844273.3333333334, ans=0.1 2023-11-19 23:13:19,173 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 126650 2023-11-19 23:13:25,600 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.681e+01 8.306e+01 8.849e+01 9.605e+01 1.260e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-19 23:13:29,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=844340.0, ans=0.2 2023-11-19 23:13:31,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=844340.0, ans=0.0 2023-11-19 23:13:37,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=844406.6666666666, ans=0.0 2023-11-19 23:14:01,787 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 6450, loss[loss=0.0786, simple_loss=0.0978, pruned_loss=0.02101, audio_tagging_loss=0.008687, over 16255.00 frames. ], tot_loss[loss=0.08425, simple_loss=0.1034, pruned_loss=0.02195, audio_tagging_loss=0.01061, over 3048356.25 frames. ], batch size: 60, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:14:07,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=844540.0, ans=0.0 2023-11-19 23:14:07,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=844540.0, ans=0.0 2023-11-19 23:14:11,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=844540.0, ans=0.125 2023-11-19 23:14:15,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=844606.6666666666, ans=0.125 2023-11-19 23:14:24,168 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 126700 2023-11-19 23:15:06,437 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 6500, loss[loss=0.07801, simple_loss=0.09879, pruned_loss=0.01944, audio_tagging_loss=0.009172, over 15644.00 frames. ], tot_loss[loss=0.08367, simple_loss=0.1029, pruned_loss=0.02165, audio_tagging_loss=0.01057, over 3050193.42 frames. ], batch size: 58, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:15:25,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=844940.0, ans=0.125 2023-11-19 23:15:29,788 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 126750 2023-11-19 23:15:35,915 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.834e+01 8.115e+01 8.787e+01 9.556e+01 1.431e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-19 23:15:37,974 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2023-11-19 23:15:47,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=845073.3333333334, ans=0.125 2023-11-19 23:15:59,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=845140.0, ans=0.0 2023-11-19 23:16:02,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=845140.0, ans=0.0 2023-11-19 23:16:12,383 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 6550, loss[loss=0.08765, simple_loss=0.1108, pruned_loss=0.02279, audio_tagging_loss=0.009475, over 15461.00 frames. ], tot_loss[loss=0.08407, simple_loss=0.1039, pruned_loss=0.02183, audio_tagging_loss=0.01031, over 3053143.68 frames. ], batch size: 56, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:16:21,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=845206.6666666666, ans=0.0 2023-11-19 23:16:34,043 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 126800 2023-11-19 23:16:50,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=845406.6666666666, ans=0.125 2023-11-19 23:16:50,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=845406.6666666666, ans=0.125 2023-11-19 23:16:52,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=845406.6666666666, ans=0.125 2023-11-19 23:17:17,400 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 6600, loss[loss=0.08835, simple_loss=0.1232, pruned_loss=0.02064, audio_tagging_loss=0.006093, over 15615.00 frames. ], tot_loss[loss=0.0845, simple_loss=0.1047, pruned_loss=0.02197, audio_tagging_loss=0.0102, over 3054512.50 frames. ], batch size: 56, lr: 6.28e-03, grad_scale: 32.0 2023-11-19 23:17:20,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=845540.0, ans=0.2 2023-11-19 23:17:27,463 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.20 vs. limit=12.0 2023-11-19 23:17:30,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=845606.6666666666, ans=0.0 2023-11-19 23:17:40,140 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 126850 2023-11-19 23:17:46,110 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.644e+01 8.323e+01 8.980e+01 9.690e+01 1.359e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 23:17:58,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=845740.0, ans=0.05 2023-11-19 23:18:22,261 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 6650, loss[loss=0.08466, simple_loss=0.1052, pruned_loss=0.02292, audio_tagging_loss=0.009158, over 15559.00 frames. ], tot_loss[loss=0.08387, simple_loss=0.1037, pruned_loss=0.02184, audio_tagging_loss=0.01019, over 3052860.14 frames. ], batch size: 60, lr: 6.28e-03, grad_scale: 32.0 2023-11-19 23:18:31,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=845873.3333333334, ans=0.0 2023-11-19 23:18:40,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=845940.0, ans=0.1 2023-11-19 23:18:43,957 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 126900 2023-11-19 23:19:26,957 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 6700, loss[loss=0.0985, simple_loss=0.1158, pruned_loss=0.02812, audio_tagging_loss=0.01249, over 15147.00 frames. ], tot_loss[loss=0.08396, simple_loss=0.1036, pruned_loss=0.02201, audio_tagging_loss=0.01018, over 3057021.13 frames. ], batch size: 57, lr: 6.28e-03, grad_scale: 16.0 2023-11-19 23:19:40,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=846273.3333333334, ans=0.5 2023-11-19 23:19:44,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=846273.3333333334, ans=0.0 2023-11-19 23:19:49,126 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 126950 2023-11-19 23:19:57,066 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.564e+01 8.084e+01 8.667e+01 9.126e+01 1.226e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-19 23:20:09,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=846406.6666666666, ans=0.05 2023-11-19 23:20:32,131 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 6750, loss[loss=0.08372, simple_loss=0.1062, pruned_loss=0.02132, audio_tagging_loss=0.009306, over 15525.00 frames. ], tot_loss[loss=0.08329, simple_loss=0.1026, pruned_loss=0.02169, audio_tagging_loss=0.01029, over 3054640.95 frames. ], batch size: 57, lr: 6.28e-03, grad_scale: 16.0 2023-11-19 23:20:37,563 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2023-11-19 23:20:53,343 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 127000 2023-11-19 23:20:58,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=846673.3333333334, ans=0.0 2023-11-19 23:20:59,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=846673.3333333334, ans=0.125 2023-11-19 23:21:36,260 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 6800, loss[loss=0.07209, simple_loss=0.0823, pruned_loss=0.01739, audio_tagging_loss=0.01355, over 14473.00 frames. ], tot_loss[loss=0.08345, simple_loss=0.1032, pruned_loss=0.02164, audio_tagging_loss=0.01021, over 3053883.85 frames. ], batch size: 56, lr: 6.28e-03, grad_scale: 32.0 2023-11-19 23:21:39,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=846873.3333333334, ans=0.125 2023-11-19 23:21:44,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=846873.3333333334, ans=0.125 2023-11-19 23:21:57,803 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 127050 2023-11-19 23:22:05,920 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.284e+01 8.995e+01 1.009e+02 1.556e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-19 23:22:16,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=847073.3333333334, ans=0.125 2023-11-19 23:22:40,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=847206.6666666666, ans=0.0 2023-11-19 23:22:41,150 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 6850, loss[loss=0.08814, simple_loss=0.1014, pruned_loss=0.02674, audio_tagging_loss=0.01068, over 15256.00 frames. ], tot_loss[loss=0.08331, simple_loss=0.1033, pruned_loss=0.02151, audio_tagging_loss=0.01017, over 3052933.69 frames. ], batch size: 56, lr: 6.28e-03, grad_scale: 32.0 2023-11-19 23:22:43,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=847206.6666666666, ans=0.1 2023-11-19 23:22:56,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=847273.3333333334, ans=0.0 2023-11-19 23:22:57,251 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.75 vs. limit=15.0 2023-11-19 23:23:03,152 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 127100 2023-11-19 23:23:17,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=847340.0, ans=0.125 2023-11-19 23:23:20,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=847406.6666666666, ans=0.125 2023-11-19 23:23:44,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=847540.0, ans=0.125 2023-11-19 23:23:45,615 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 6900, loss[loss=0.07895, simple_loss=0.09911, pruned_loss=0.02075, audio_tagging_loss=0.008643, over 15206.00 frames. ], tot_loss[loss=0.08387, simple_loss=0.1042, pruned_loss=0.02157, audio_tagging_loss=0.0102, over 3054182.40 frames. ], batch size: 55, lr: 6.28e-03, grad_scale: 32.0 2023-11-19 23:24:04,888 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2023-11-19 23:24:07,976 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 127150 2023-11-19 23:24:13,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=847673.3333333334, ans=0.0 2023-11-19 23:24:16,958 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.980e+01 8.238e+01 8.922e+01 9.730e+01 1.552e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-19 23:24:19,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=847673.3333333334, ans=0.0 2023-11-19 23:24:24,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=847740.0, ans=0.05 2023-11-19 23:24:26,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.00 vs. limit=15.0 2023-11-19 23:24:37,218 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 23:24:50,543 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 6950, loss[loss=0.0708, simple_loss=0.07964, pruned_loss=0.01575, audio_tagging_loss=0.01523, over 15183.00 frames. ], tot_loss[loss=0.08336, simple_loss=0.1034, pruned_loss=0.02132, audio_tagging_loss=0.01034, over 3053457.79 frames. ], batch size: 59, lr: 6.28e-03, grad_scale: 16.0 2023-11-19 23:24:52,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=847873.3333333334, ans=0.125 2023-11-19 23:24:58,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=847873.3333333334, ans=0.125 2023-11-19 23:25:12,264 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 127200 2023-11-19 23:25:32,270 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=12.0 2023-11-19 23:25:44,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=848140.0, ans=0.125 2023-11-19 23:25:49,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=848140.0, ans=0.125 2023-11-19 23:25:54,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=848206.6666666666, ans=0.125 2023-11-19 23:25:55,819 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 7000, loss[loss=0.1074, simple_loss=0.1339, pruned_loss=0.03111, audio_tagging_loss=0.009385, over 14863.00 frames. ], tot_loss[loss=0.08364, simple_loss=0.1037, pruned_loss=0.02149, audio_tagging_loss=0.01031, over 3043804.12 frames. ], batch size: 56, lr: 6.27e-03, grad_scale: 16.0 2023-11-19 23:26:02,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=848206.6666666666, ans=0.09899494936611666 2023-11-19 23:26:17,199 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 127250 2023-11-19 23:26:23,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=848340.0, ans=0.125 2023-11-19 23:26:26,378 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.925e+01 8.193e+01 9.050e+01 1.000e+02 1.255e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 23:26:49,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=848473.3333333334, ans=10.0 2023-11-19 23:26:53,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=848473.3333333334, ans=0.125 2023-11-19 23:27:00,095 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 7050, loss[loss=0.0902, simple_loss=0.1135, pruned_loss=0.02486, audio_tagging_loss=0.008608, over 14721.00 frames. ], tot_loss[loss=0.08304, simple_loss=0.1026, pruned_loss=0.02137, audio_tagging_loss=0.01036, over 3045146.39 frames. ], batch size: 56, lr: 6.27e-03, grad_scale: 16.0 2023-11-19 23:27:15,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=848606.6666666666, ans=0.0 2023-11-19 23:27:22,370 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 127300 2023-11-19 23:27:32,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=848673.3333333334, ans=0.1 2023-11-19 23:27:55,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=848806.6666666666, ans=0.0 2023-11-19 23:28:03,964 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 7100, loss[loss=0.07147, simple_loss=0.08324, pruned_loss=0.01782, audio_tagging_loss=0.01203, over 16166.00 frames. ], tot_loss[loss=0.08331, simple_loss=0.1027, pruned_loss=0.02151, audio_tagging_loss=0.01046, over 3053209.19 frames. ], batch size: 62, lr: 6.27e-03, grad_scale: 16.0 2023-11-19 23:28:11,416 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.78 vs. limit=15.0 2023-11-19 23:28:14,813 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:28:26,309 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 127350 2023-11-19 23:28:34,693 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.672e+01 8.042e+01 8.769e+01 9.719e+01 1.215e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-19 23:28:35,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=849006.6666666666, ans=0.125 2023-11-19 23:28:52,406 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2023-11-19 23:28:53,414 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.52 vs. limit=15.0 2023-11-19 23:29:02,935 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.954e-03 2023-11-19 23:29:09,329 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 7150, loss[loss=0.07235, simple_loss=0.09449, pruned_loss=0.01526, audio_tagging_loss=0.009835, over 15975.00 frames. ], tot_loss[loss=0.08365, simple_loss=0.1031, pruned_loss=0.02167, audio_tagging_loss=0.01043, over 3053634.07 frames. ], batch size: 61, lr: 6.27e-03, grad_scale: 16.0 2023-11-19 23:29:13,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=849206.6666666666, ans=0.125 2023-11-19 23:29:14,045 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.78 vs. limit=6.0 2023-11-19 23:29:30,729 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 127400 2023-11-19 23:29:57,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=849406.6666666666, ans=0.125 2023-11-19 23:30:13,324 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 7200, loss[loss=0.07041, simple_loss=0.08542, pruned_loss=0.0169, audio_tagging_loss=0.0108, over 14762.00 frames. ], tot_loss[loss=0.08382, simple_loss=0.1035, pruned_loss=0.02164, audio_tagging_loss=0.01045, over 3045680.17 frames. ], batch size: 54, lr: 6.27e-03, grad_scale: 32.0 2023-11-19 23:30:25,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=849606.6666666666, ans=0.125 2023-11-19 23:30:35,610 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 127450 2023-11-19 23:30:42,464 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:30:45,311 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.591e+01 9.847e+01 1.111e+02 1.455e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-19 23:30:48,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=849673.3333333334, ans=0.125 2023-11-19 23:30:50,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=849673.3333333334, ans=0.125 2023-11-19 23:31:18,086 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 7250, loss[loss=0.07873, simple_loss=0.09725, pruned_loss=0.01874, audio_tagging_loss=0.01136, over 15529.00 frames. ], tot_loss[loss=0.08401, simple_loss=0.1037, pruned_loss=0.02161, audio_tagging_loss=0.01056, over 3051039.50 frames. ], batch size: 57, lr: 6.27e-03, grad_scale: 32.0 2023-11-19 23:31:28,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=849873.3333333334, ans=0.0 2023-11-19 23:31:36,560 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2023-11-19 23:31:40,804 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 127500 2023-11-19 23:31:49,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=850006.6666666666, ans=0.05 2023-11-19 23:31:52,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=850006.6666666666, ans=0.125 2023-11-19 23:32:23,467 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 7300, loss[loss=0.08747, simple_loss=0.1079, pruned_loss=0.02439, audio_tagging_loss=0.009143, over 15621.00 frames. ], tot_loss[loss=0.08353, simple_loss=0.1032, pruned_loss=0.02139, audio_tagging_loss=0.01052, over 3051052.66 frames. ], batch size: 58, lr: 6.27e-03, grad_scale: 32.0 2023-11-19 23:32:44,983 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 127550 2023-11-19 23:32:45,514 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=12.0 2023-11-19 23:32:51,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=850340.0, ans=0.125 2023-11-19 23:32:53,436 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.382e+01 8.309e+01 8.798e+01 9.625e+01 1.232e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-19 23:32:53,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=850340.0, ans=0.0 2023-11-19 23:32:53,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=850340.0, ans=0.5 2023-11-19 23:32:53,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=850340.0, ans=0.1 2023-11-19 23:33:09,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=850406.6666666666, ans=0.125 2023-11-19 23:33:27,419 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 7350, loss[loss=0.04548, simple_loss=0.04644, pruned_loss=0.007594, audio_tagging_loss=0.01466, over 15259.00 frames. ], tot_loss[loss=0.08346, simple_loss=0.1029, pruned_loss=0.02164, audio_tagging_loss=0.01036, over 3049193.76 frames. ], batch size: 60, lr: 6.27e-03, grad_scale: 32.0 2023-11-19 23:33:47,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=850606.6666666666, ans=15.0 2023-11-19 23:33:48,897 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 127600 2023-11-19 23:33:52,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=850673.3333333334, ans=0.1 2023-11-19 23:34:26,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=850806.6666666666, ans=0.125 2023-11-19 23:34:26,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=850806.6666666666, ans=0.0 2023-11-19 23:34:31,286 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 7400, loss[loss=0.07893, simple_loss=0.1056, pruned_loss=0.02039, audio_tagging_loss=0.005733, over 15414.00 frames. ], tot_loss[loss=0.08329, simple_loss=0.1028, pruned_loss=0.02165, audio_tagging_loss=0.01023, over 3044722.38 frames. ], batch size: 57, lr: 6.26e-03, grad_scale: 32.0 2023-11-19 23:34:53,557 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 127650 2023-11-19 23:35:02,495 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.769e+01 8.438e+01 8.975e+01 9.970e+01 1.315e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-19 23:35:07,693 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:35:15,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=851073.3333333334, ans=0.0 2023-11-19 23:35:24,821 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.87 vs. limit=15.0 2023-11-19 23:35:35,677 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 7450, loss[loss=0.07142, simple_loss=0.08482, pruned_loss=0.01825, audio_tagging_loss=0.01075, over 14367.00 frames. ], tot_loss[loss=0.08315, simple_loss=0.1027, pruned_loss=0.02157, audio_tagging_loss=0.01022, over 3047241.02 frames. ], batch size: 53, lr: 6.26e-03, grad_scale: 32.0 2023-11-19 23:35:53,290 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.46 vs. limit=22.5 2023-11-19 23:35:57,640 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 127700 2023-11-19 23:36:12,290 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2023-11-19 23:36:40,535 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 7500, loss[loss=0.09951, simple_loss=0.1269, pruned_loss=0.02402, audio_tagging_loss=0.01202, over 15543.00 frames. ], tot_loss[loss=0.08322, simple_loss=0.1027, pruned_loss=0.02171, audio_tagging_loss=0.01017, over 3050514.67 frames. ], batch size: 57, lr: 6.26e-03, grad_scale: 32.0 2023-11-19 23:37:02,016 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 127750 2023-11-19 23:37:12,197 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 8.268e+01 8.982e+01 9.702e+01 1.380e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 23:37:20,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=851740.0, ans=0.125 2023-11-19 23:37:24,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=851740.0, ans=0.09899494936611666 2023-11-19 23:37:33,607 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.14 vs. limit=15.0 2023-11-19 23:37:44,456 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 7550, loss[loss=0.04852, simple_loss=0.05847, pruned_loss=0.007631, audio_tagging_loss=0.01165, over 14992.00 frames. ], tot_loss[loss=0.08333, simple_loss=0.1035, pruned_loss=0.02161, audio_tagging_loss=0.009975, over 3055656.31 frames. ], batch size: 56, lr: 6.26e-03, grad_scale: 16.0 2023-11-19 23:37:51,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=851873.3333333334, ans=0.0 2023-11-19 23:37:53,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=851873.3333333334, ans=0.125 2023-11-19 23:38:06,329 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 127800 2023-11-19 23:38:11,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=852006.6666666666, ans=0.1 2023-11-19 23:38:30,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=852073.3333333334, ans=0.1 2023-11-19 23:38:34,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=852140.0, ans=0.125 2023-11-19 23:38:48,574 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 7600, loss[loss=0.09949, simple_loss=0.1339, pruned_loss=0.02473, audio_tagging_loss=0.007797, over 15267.00 frames. ], tot_loss[loss=0.0829, simple_loss=0.1027, pruned_loss=0.02147, audio_tagging_loss=0.01009, over 3053516.80 frames. ], batch size: 56, lr: 6.26e-03, grad_scale: 32.0 2023-11-19 23:38:50,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=852206.6666666666, ans=0.0 2023-11-19 23:38:57,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=852206.6666666666, ans=0.0 2023-11-19 23:39:10,690 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 127850 2023-11-19 23:39:13,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=852340.0, ans=0.125 2023-11-19 23:39:20,827 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.143e+01 8.303e+01 8.868e+01 9.604e+01 1.243e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-19 23:39:21,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=852340.0, ans=0.2 2023-11-19 23:39:23,723 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=22.5 2023-11-19 23:39:38,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=852473.3333333334, ans=0.0 2023-11-19 23:39:51,580 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.272e-02 2023-11-19 23:39:52,480 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 7650, loss[loss=0.09374, simple_loss=0.1213, pruned_loss=0.02327, audio_tagging_loss=0.009824, over 15680.00 frames. ], tot_loss[loss=0.08341, simple_loss=0.1032, pruned_loss=0.0217, audio_tagging_loss=0.01008, over 3052898.44 frames. ], batch size: 56, lr: 6.26e-03, grad_scale: 32.0 2023-11-19 23:40:04,872 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.571e-03 2023-11-19 23:40:14,493 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 127900 2023-11-19 23:40:44,497 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.48 vs. limit=5.0 2023-11-19 23:40:57,030 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 7700, loss[loss=0.08619, simple_loss=0.1075, pruned_loss=0.02181, audio_tagging_loss=0.01065, over 15766.00 frames. ], tot_loss[loss=0.08382, simple_loss=0.1036, pruned_loss=0.0219, audio_tagging_loss=0.0101, over 3053778.04 frames. ], batch size: 58, lr: 6.26e-03, grad_scale: 16.0 2023-11-19 23:41:19,345 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 127950 2023-11-19 23:41:23,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=853006.6666666666, ans=0.0 2023-11-19 23:41:31,371 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.499e+01 8.407e+01 9.041e+01 9.727e+01 1.362e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 23:41:35,787 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2023-11-19 23:41:49,363 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.88 vs. limit=10.0 2023-11-19 23:42:01,280 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 7750, loss[loss=0.08674, simple_loss=0.09805, pruned_loss=0.02617, audio_tagging_loss=0.01154, over 14312.00 frames. ], tot_loss[loss=0.08355, simple_loss=0.1034, pruned_loss=0.02171, audio_tagging_loss=0.01011, over 3050740.21 frames. ], batch size: 57, lr: 6.26e-03, grad_scale: 8.0 2023-11-19 23:42:18,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=853273.3333333334, ans=0.0 2023-11-19 23:42:22,870 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 128000 2023-11-19 23:42:24,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=853273.3333333334, ans=0.0 2023-11-19 23:42:29,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=853340.0, ans=0.0 2023-11-19 23:42:46,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=853406.6666666666, ans=0.1 2023-11-19 23:42:51,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=853406.6666666666, ans=0.2 2023-11-19 23:42:58,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=853473.3333333334, ans=0.125 2023-11-19 23:42:59,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=853473.3333333334, ans=0.125 2023-11-19 23:43:09,390 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 7800, loss[loss=0.09341, simple_loss=0.1206, pruned_loss=0.02591, audio_tagging_loss=0.007194, over 15970.00 frames. ], tot_loss[loss=0.08341, simple_loss=0.1031, pruned_loss=0.02171, audio_tagging_loss=0.01016, over 3049409.40 frames. ], batch size: 58, lr: 6.25e-03, grad_scale: 8.0 2023-11-19 23:43:10,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=853540.0, ans=0.125 2023-11-19 23:43:15,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=853540.0, ans=0.1 2023-11-19 23:43:31,567 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 128050 2023-11-19 23:43:32,418 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.19 vs. limit=10.0 2023-11-19 23:43:44,191 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.125e+01 8.217e+01 8.897e+01 9.655e+01 1.501e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-19 23:44:03,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=853806.6666666666, ans=0.1 2023-11-19 23:44:05,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=853806.6666666666, ans=0.1 2023-11-19 23:44:08,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.00 vs. limit=22.5 2023-11-19 23:44:14,101 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 7850, loss[loss=0.0904, simple_loss=0.1163, pruned_loss=0.02361, audio_tagging_loss=0.008643, over 14834.00 frames. ], tot_loss[loss=0.08406, simple_loss=0.1041, pruned_loss=0.02196, audio_tagging_loss=0.01007, over 3048026.41 frames. ], batch size: 56, lr: 6.25e-03, grad_scale: 8.0 2023-11-19 23:44:14,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=853873.3333333334, ans=0.0 2023-11-19 23:44:32,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=853940.0, ans=0.125 2023-11-19 23:44:35,537 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 128100 2023-11-19 23:44:43,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=854006.6666666666, ans=0.125 2023-11-19 23:44:53,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=854073.3333333334, ans=0.125 2023-11-19 23:45:15,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=854140.0, ans=0.1 2023-11-19 23:45:17,516 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 7900, loss[loss=0.1064, simple_loss=0.1335, pruned_loss=0.03138, audio_tagging_loss=0.008318, over 16074.00 frames. ], tot_loss[loss=0.08426, simple_loss=0.1043, pruned_loss=0.02197, audio_tagging_loss=0.01016, over 3044216.85 frames. ], batch size: 59, lr: 6.25e-03, grad_scale: 8.0 2023-11-19 23:45:17,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=854206.6666666666, ans=0.0 2023-11-19 23:45:19,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=854206.6666666666, ans=0.125 2023-11-19 23:45:39,327 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 128150 2023-11-19 23:45:52,896 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.807e+01 8.209e+01 8.987e+01 9.607e+01 1.593e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-19 23:46:08,739 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.04 vs. limit=22.5 2023-11-19 23:46:20,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=854473.3333333334, ans=0.125 2023-11-19 23:46:22,314 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 7950, loss[loss=0.06738, simple_loss=0.09322, pruned_loss=0.01158, audio_tagging_loss=0.009189, over 14366.00 frames. ], tot_loss[loss=0.08437, simple_loss=0.1044, pruned_loss=0.02189, audio_tagging_loss=0.01029, over 3044622.24 frames. ], batch size: 55, lr: 6.25e-03, grad_scale: 8.0 2023-11-19 23:46:25,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=854540.0, ans=0.2 2023-11-19 23:46:38,902 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 23:46:44,291 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 128200 2023-11-19 23:46:45,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=854606.6666666666, ans=0.0 2023-11-19 23:46:46,109 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.43 vs. limit=15.0 2023-11-19 23:46:52,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=854673.3333333334, ans=0.125 2023-11-19 23:46:54,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=854673.3333333334, ans=0.125 2023-11-19 23:47:07,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=854740.0, ans=0.2 2023-11-19 23:47:13,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=854806.6666666666, ans=0.125 2023-11-19 23:47:20,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=854806.6666666666, ans=0.0 2023-11-19 23:47:20,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=854806.6666666666, ans=0.0 2023-11-19 23:47:22,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=854806.6666666666, ans=0.125 2023-11-19 23:47:26,496 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 8000, loss[loss=0.09504, simple_loss=0.1173, pruned_loss=0.0262, audio_tagging_loss=0.01021, over 16510.00 frames. ], tot_loss[loss=0.08288, simple_loss=0.1022, pruned_loss=0.0213, audio_tagging_loss=0.01047, over 3043224.68 frames. ], batch size: 59, lr: 6.25e-03, grad_scale: 16.0 2023-11-19 23:47:31,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=854873.3333333334, ans=0.2 2023-11-19 23:47:43,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=854940.0, ans=0.0 2023-11-19 23:47:49,104 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 128250 2023-11-19 23:47:49,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=854940.0, ans=0.125 2023-11-19 23:47:50,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=854940.0, ans=0.0 2023-11-19 23:47:58,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=855006.6666666666, ans=0.015 2023-11-19 23:47:59,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=855006.6666666666, ans=0.1 2023-11-19 23:48:01,614 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.413e+01 8.250e+01 9.015e+01 9.647e+01 1.325e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-19 23:48:27,449 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.72 vs. limit=10.0 2023-11-19 23:48:27,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=855140.0, ans=0.0 2023-11-19 23:48:29,842 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.61 vs. limit=22.5 2023-11-19 23:48:31,456 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 8050, loss[loss=0.07309, simple_loss=0.08983, pruned_loss=0.01916, audio_tagging_loss=0.009024, over 14681.00 frames. ], tot_loss[loss=0.08298, simple_loss=0.1022, pruned_loss=0.02138, audio_tagging_loss=0.01049, over 3035687.42 frames. ], batch size: 56, lr: 6.25e-03, grad_scale: 16.0 2023-11-19 23:48:35,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=855206.6666666666, ans=0.1 2023-11-19 23:48:39,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=855206.6666666666, ans=0.125 2023-11-19 23:48:39,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=855206.6666666666, ans=0.0 2023-11-19 23:48:40,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=855206.6666666666, ans=0.125 2023-11-19 23:48:53,375 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 128300 2023-11-19 23:49:24,705 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.14 vs. limit=22.5 2023-11-19 23:49:35,452 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 8100, loss[loss=0.06178, simple_loss=0.06709, pruned_loss=0.01536, audio_tagging_loss=0.01288, over 15878.00 frames. ], tot_loss[loss=0.08338, simple_loss=0.1028, pruned_loss=0.02157, audio_tagging_loss=0.0104, over 3041007.37 frames. ], batch size: 60, lr: 6.25e-03, grad_scale: 16.0 2023-11-19 23:49:51,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=855606.6666666666, ans=0.0 2023-11-19 23:49:56,756 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 128350 2023-11-19 23:50:01,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=855673.3333333334, ans=0.125 2023-11-19 23:50:09,347 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.585e+01 8.405e+01 9.063e+01 9.959e+01 1.355e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-19 23:50:16,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=855740.0, ans=0.1 2023-11-19 23:50:22,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=855740.0, ans=0.1 2023-11-19 23:50:38,121 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 8150, loss[loss=0.07849, simple_loss=0.09877, pruned_loss=0.02031, audio_tagging_loss=0.008792, over 14364.00 frames. ], tot_loss[loss=0.08255, simple_loss=0.1022, pruned_loss=0.02121, audio_tagging_loss=0.01026, over 3041712.99 frames. ], batch size: 54, lr: 6.25e-03, grad_scale: 16.0 2023-11-19 23:50:45,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=855873.3333333334, ans=22.5 2023-11-19 23:51:00,911 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 128400 2023-11-19 23:51:02,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=855940.0, ans=0.0 2023-11-19 23:51:23,133 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.05 vs. limit=15.0 2023-11-19 23:51:30,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=856140.0, ans=0.025 2023-11-19 23:51:37,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=856140.0, ans=0.125 2023-11-19 23:51:42,601 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 8200, loss[loss=0.0718, simple_loss=0.09128, pruned_loss=0.01608, audio_tagging_loss=0.01009, over 14231.00 frames. ], tot_loss[loss=0.08233, simple_loss=0.102, pruned_loss=0.02119, audio_tagging_loss=0.01014, over 3038087.79 frames. ], batch size: 56, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:51:42,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=856206.6666666666, ans=0.2 2023-11-19 23:51:45,066 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 23:51:57,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=856273.3333333334, ans=0.125 2023-11-19 23:52:05,204 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 128450 2023-11-19 23:52:17,396 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.991e+01 8.248e+01 8.899e+01 9.586e+01 1.451e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-19 23:52:19,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=856340.0, ans=0.1 2023-11-19 23:52:28,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=856406.6666666666, ans=0.2 2023-11-19 23:52:48,578 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 8250, loss[loss=0.06444, simple_loss=0.0797, pruned_loss=0.01151, audio_tagging_loss=0.01308, over 15792.00 frames. ], tot_loss[loss=0.08261, simple_loss=0.1021, pruned_loss=0.02136, audio_tagging_loss=0.01019, over 3039982.10 frames. ], batch size: 60, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:52:53,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=856540.0, ans=0.0 2023-11-19 23:53:06,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=856606.6666666666, ans=0.2 2023-11-19 23:53:09,868 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 128500 2023-11-19 23:53:12,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.98 vs. limit=6.0 2023-11-19 23:53:13,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=856673.3333333334, ans=0.125 2023-11-19 23:53:39,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=856806.6666666666, ans=0.125 2023-11-19 23:53:44,635 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=15.0 2023-11-19 23:53:47,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=856806.6666666666, ans=0.1 2023-11-19 23:53:51,364 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 8300, loss[loss=0.08523, simple_loss=0.113, pruned_loss=0.02116, audio_tagging_loss=0.007589, over 13501.00 frames. ], tot_loss[loss=0.08219, simple_loss=0.1016, pruned_loss=0.02122, audio_tagging_loss=0.01018, over 3040545.47 frames. ], batch size: 53, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:54:07,308 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2023-11-19 23:54:08,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=856940.0, ans=0.125 2023-11-19 23:54:12,770 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 128550 2023-11-19 23:54:26,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=857006.6666666666, ans=0.125 2023-11-19 23:54:27,919 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.671e+01 8.283e+01 8.806e+01 9.666e+01 1.225e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-19 23:54:45,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=857140.0, ans=0.0 2023-11-19 23:54:55,411 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 8350, loss[loss=0.06199, simple_loss=0.07373, pruned_loss=0.01613, audio_tagging_loss=0.008999, over 14839.00 frames. ], tot_loss[loss=0.08192, simple_loss=0.1012, pruned_loss=0.02122, audio_tagging_loss=0.0101, over 3037545.31 frames. ], batch size: 59, lr: 6.24e-03, grad_scale: 8.0 2023-11-19 23:55:18,364 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 128600 2023-11-19 23:55:24,103 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.86 vs. limit=22.5 2023-11-19 23:55:25,243 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2023-11-19 23:55:33,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=857406.6666666666, ans=0.0 2023-11-19 23:55:36,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=857406.6666666666, ans=0.0 2023-11-19 23:56:00,795 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 8400, loss[loss=0.05843, simple_loss=0.07536, pruned_loss=0.01257, audio_tagging_loss=0.008174, over 14696.00 frames. ], tot_loss[loss=0.08211, simple_loss=0.1017, pruned_loss=0.02119, audio_tagging_loss=0.01008, over 3037406.28 frames. ], batch size: 58, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:56:18,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=857606.6666666666, ans=0.125 2023-11-19 23:56:22,347 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 128650 2023-11-19 23:56:32,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=857673.3333333334, ans=0.0 2023-11-19 23:56:36,150 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.566e+01 8.256e+01 8.921e+01 9.764e+01 1.880e+02, threshold=1.784e+02, percent-clipped=1.0 2023-11-19 23:56:41,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=857740.0, ans=0.0 2023-11-19 23:57:04,617 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 8450, loss[loss=0.07936, simple_loss=0.09547, pruned_loss=0.01913, audio_tagging_loss=0.01249, over 14558.00 frames. ], tot_loss[loss=0.0825, simple_loss=0.1022, pruned_loss=0.0213, audio_tagging_loss=0.01009, over 3038096.30 frames. ], batch size: 56, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:57:21,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=857940.0, ans=0.125 2023-11-19 23:57:25,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=857940.0, ans=0.0 2023-11-19 23:57:26,131 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 128700 2023-11-19 23:57:46,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=858073.3333333334, ans=0.0 2023-11-19 23:57:50,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=858073.3333333334, ans=0.1 2023-11-19 23:58:08,063 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 8500, loss[loss=0.0751, simple_loss=0.09265, pruned_loss=0.02111, audio_tagging_loss=0.007665, over 15036.00 frames. ], tot_loss[loss=0.08305, simple_loss=0.103, pruned_loss=0.02136, audio_tagging_loss=0.01017, over 3044771.48 frames. ], batch size: 56, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:58:30,613 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 128750 2023-11-19 23:58:32,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=858273.3333333334, ans=0.2 2023-11-19 23:58:43,804 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.768e+01 8.248e+01 9.044e+01 1.008e+02 1.243e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-19 23:59:12,509 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 8550, loss[loss=0.06567, simple_loss=0.07769, pruned_loss=0.01409, audio_tagging_loss=0.01273, over 15168.00 frames. ], tot_loss[loss=0.08335, simple_loss=0.1032, pruned_loss=0.02145, audio_tagging_loss=0.01032, over 3045720.76 frames. ], batch size: 58, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:59:17,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=858540.0, ans=0.07 2023-11-19 23:59:32,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=858606.6666666666, ans=0.2 2023-11-19 23:59:34,434 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 128800 2023-11-19 23:59:35,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=858606.6666666666, ans=0.125 2023-11-19 23:59:37,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=858673.3333333334, ans=0.0 2023-11-19 23:59:41,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=858673.3333333334, ans=0.1 2023-11-19 23:59:50,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=858740.0, ans=0.0 2023-11-19 23:59:55,352 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.12 vs. limit=12.0 2023-11-20 00:00:09,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=858806.6666666666, ans=0.1 2023-11-20 00:00:10,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=858806.6666666666, ans=0.0 2023-11-20 00:00:15,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=858806.6666666666, ans=0.0 2023-11-20 00:00:17,221 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 8600, loss[loss=0.09035, simple_loss=0.113, pruned_loss=0.02545, audio_tagging_loss=0.008383, over 15691.00 frames. ], tot_loss[loss=0.08335, simple_loss=0.1033, pruned_loss=0.0215, audio_tagging_loss=0.01018, over 3041442.26 frames. ], batch size: 56, lr: 6.24e-03, grad_scale: 16.0 2023-11-20 00:00:19,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=858873.3333333334, ans=0.125 2023-11-20 00:00:30,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=858940.0, ans=0.125 2023-11-20 00:00:38,577 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 128850 2023-11-20 00:00:52,616 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.469e+01 8.239e+01 8.842e+01 9.457e+01 1.153e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-20 00:00:54,793 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.481e-01 2023-11-20 00:00:54,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=859073.3333333334, ans=0.125 2023-11-20 00:01:01,221 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2023-11-20 00:01:15,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=859140.0, ans=0.025 2023-11-20 00:01:20,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=859206.6666666666, ans=0.0 2023-11-20 00:01:21,499 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 8650, loss[loss=0.08728, simple_loss=0.1068, pruned_loss=0.02061, audio_tagging_loss=0.01327, over 15589.00 frames. ], tot_loss[loss=0.08396, simple_loss=0.1043, pruned_loss=0.02164, audio_tagging_loss=0.01017, over 3042062.98 frames. ], batch size: 57, lr: 6.23e-03, grad_scale: 16.0 2023-11-20 00:01:43,260 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 128900 2023-11-20 00:01:43,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=859273.3333333334, ans=0.0 2023-11-20 00:01:45,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=859340.0, ans=0.0 2023-11-20 00:02:01,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=859406.6666666666, ans=0.125 2023-11-20 00:02:15,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=859473.3333333334, ans=0.125 2023-11-20 00:02:24,885 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 8700, loss[loss=0.06215, simple_loss=0.06592, pruned_loss=0.01535, audio_tagging_loss=0.01384, over 15063.00 frames. ], tot_loss[loss=0.08453, simple_loss=0.1047, pruned_loss=0.02192, audio_tagging_loss=0.01025, over 3047249.11 frames. ], batch size: 61, lr: 6.23e-03, grad_scale: 16.0 2023-11-20 00:02:29,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=859540.0, ans=0.09899494936611666 2023-11-20 00:02:47,775 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 128950 2023-11-20 00:03:01,589 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.745e+01 8.243e+01 8.970e+01 9.872e+01 1.298e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-20 00:03:06,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=859740.0, ans=0.0 2023-11-20 00:03:22,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=859806.6666666666, ans=0.95 2023-11-20 00:03:28,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=859873.3333333334, ans=0.125 2023-11-20 00:03:29,215 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 8750, loss[loss=0.05889, simple_loss=0.06567, pruned_loss=0.01326, audio_tagging_loss=0.01279, over 16528.00 frames. ], tot_loss[loss=0.08465, simple_loss=0.1047, pruned_loss=0.02197, audio_tagging_loss=0.01035, over 3047676.85 frames. ], batch size: 64, lr: 6.23e-03, grad_scale: 16.0 2023-11-20 00:03:29,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=859873.3333333334, ans=0.125 2023-11-20 00:03:40,058 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.22 vs. limit=22.5 2023-11-20 00:03:49,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=859940.0, ans=0.0 2023-11-20 00:03:51,126 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 129000 2023-11-20 00:03:55,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=860006.6666666666, ans=10.0 2023-11-20 00:04:02,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=860006.6666666666, ans=0.1 2023-11-20 00:04:25,520 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2023-11-20 00:04:26,738 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=15.0 2023-11-20 00:04:29,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=860140.0, ans=0.0 2023-11-20 00:04:32,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=860206.6666666666, ans=0.125 2023-11-20 00:04:33,957 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 8800, loss[loss=0.09324, simple_loss=0.1054, pruned_loss=0.02948, audio_tagging_loss=0.01106, over 15256.00 frames. ], tot_loss[loss=0.08421, simple_loss=0.1043, pruned_loss=0.02177, audio_tagging_loss=0.01031, over 3046046.00 frames. ], batch size: 57, lr: 6.23e-03, grad_scale: 32.0 2023-11-20 00:04:46,918 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.58 vs. limit=10.0 2023-11-20 00:04:52,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=860273.3333333334, ans=0.125 2023-11-20 00:04:55,365 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 129050 2023-11-20 00:05:05,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=860340.0, ans=0.125 2023-11-20 00:05:09,090 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.191e+01 8.422e+01 9.194e+01 1.008e+02 1.237e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-20 00:05:10,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=860406.6666666666, ans=0.0 2023-11-20 00:05:21,470 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.58 vs. limit=15.0 2023-11-20 00:05:22,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=860406.6666666666, ans=0.125 2023-11-20 00:05:32,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=860473.3333333334, ans=0.125 2023-11-20 00:05:37,388 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 8850, loss[loss=0.09108, simple_loss=0.1164, pruned_loss=0.02412, audio_tagging_loss=0.008737, over 16066.00 frames. ], tot_loss[loss=0.08418, simple_loss=0.1043, pruned_loss=0.0217, audio_tagging_loss=0.01033, over 3048323.30 frames. ], batch size: 59, lr: 6.23e-03, grad_scale: 32.0 2023-11-20 00:05:52,308 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:05:59,810 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 129100 2023-11-20 00:06:20,205 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=15.0 2023-11-20 00:06:24,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=860740.0, ans=0.0 2023-11-20 00:06:25,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=860740.0, ans=0.1 2023-11-20 00:06:43,078 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 8900, loss[loss=0.05923, simple_loss=0.0735, pruned_loss=0.01263, audio_tagging_loss=0.009852, over 15002.00 frames. ], tot_loss[loss=0.08401, simple_loss=0.1042, pruned_loss=0.02169, audio_tagging_loss=0.0102, over 3044206.64 frames. ], batch size: 55, lr: 6.23e-03, grad_scale: 32.0 2023-11-20 00:07:05,286 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 129150 2023-11-20 00:07:18,685 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.249e+01 8.942e+01 1.024e+02 1.298e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 00:07:20,255 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:07:26,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=861073.3333333334, ans=0.0 2023-11-20 00:07:43,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=861140.0, ans=0.2 2023-11-20 00:07:47,621 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 8950, loss[loss=0.06977, simple_loss=0.07887, pruned_loss=0.01787, audio_tagging_loss=0.01246, over 15039.00 frames. ], tot_loss[loss=0.08321, simple_loss=0.1035, pruned_loss=0.02141, audio_tagging_loss=0.01007, over 3041306.45 frames. ], batch size: 58, lr: 6.23e-03, grad_scale: 32.0 2023-11-20 00:08:09,163 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 129200 2023-11-20 00:08:16,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=861340.0, ans=0.125 2023-11-20 00:08:26,855 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-11-20 00:08:52,274 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 9000, loss[loss=0.06839, simple_loss=0.07818, pruned_loss=0.01791, audio_tagging_loss=0.01139, over 14910.00 frames. ], tot_loss[loss=0.08325, simple_loss=0.1036, pruned_loss=0.02143, audio_tagging_loss=0.01001, over 3040990.63 frames. ], batch size: 58, lr: 6.23e-03, grad_scale: 32.0 2023-11-20 00:08:52,275 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-20 00:09:17,294 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.9922, 5.9054, 5.7775, 5.5732], device='cuda:3') 2023-11-20 00:09:31,828 INFO [train_asr.py:1294] (3/4) Epoch 11, validation: loss=0.06425, simple_loss=0.05461, pruned_loss=0.006061, audio_tagging_loss=0.03088, over 4681554.00 frames. 2023-11-20 00:09:31,829 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-20 00:09:34,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=861540.0, ans=0.2 2023-11-20 00:09:52,247 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.62 vs. limit=15.0 2023-11-20 00:09:53,908 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 129250 2023-11-20 00:09:57,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=861673.3333333334, ans=0.2 2023-11-20 00:10:08,225 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.624e+01 8.204e+01 8.877e+01 9.469e+01 1.301e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 00:10:20,811 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.18 vs. limit=10.0 2023-11-20 00:10:22,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=861806.6666666666, ans=0.0 2023-11-20 00:10:27,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=861806.6666666666, ans=0.125 2023-11-20 00:10:35,647 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 9050, loss[loss=0.07782, simple_loss=0.09678, pruned_loss=0.02086, audio_tagging_loss=0.008575, over 13793.00 frames. ], tot_loss[loss=0.08304, simple_loss=0.1034, pruned_loss=0.02144, audio_tagging_loss=0.009886, over 3039974.69 frames. ], batch size: 57, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:10:42,940 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.02 vs. limit=10.0 2023-11-20 00:10:45,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=861873.3333333334, ans=0.125 2023-11-20 00:10:47,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=861940.0, ans=0.125 2023-11-20 00:10:57,860 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 129300 2023-11-20 00:11:09,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=862006.6666666666, ans=0.1 2023-11-20 00:11:14,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=862073.3333333334, ans=0.0 2023-11-20 00:11:24,903 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:11:34,212 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2023-11-20 00:11:39,800 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 9100, loss[loss=0.05469, simple_loss=0.06955, pruned_loss=0.01096, audio_tagging_loss=0.008958, over 16986.00 frames. ], tot_loss[loss=0.08228, simple_loss=0.1026, pruned_loss=0.02111, audio_tagging_loss=0.009859, over 3040470.84 frames. ], batch size: 65, lr: 6.22e-03, grad_scale: 16.0 2023-11-20 00:11:43,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=862206.6666666666, ans=0.125 2023-11-20 00:12:02,061 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 129350 2023-11-20 00:12:17,841 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.243e+01 9.006e+01 9.571e+01 1.391e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-20 00:12:33,533 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=22.5 2023-11-20 00:12:44,995 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 9150, loss[loss=0.1023, simple_loss=0.1281, pruned_loss=0.02997, audio_tagging_loss=0.008263, over 14314.00 frames. ], tot_loss[loss=0.08275, simple_loss=0.1029, pruned_loss=0.02138, audio_tagging_loss=0.009931, over 3040990.66 frames. ], batch size: 54, lr: 6.22e-03, grad_scale: 16.0 2023-11-20 00:13:06,331 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 129400 2023-11-20 00:13:14,083 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.33 vs. limit=22.5 2023-11-20 00:13:19,179 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.19 vs. limit=22.5 2023-11-20 00:13:22,931 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.07 vs. limit=6.0 2023-11-20 00:13:33,505 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.437e-01 2023-11-20 00:13:49,202 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 9200, loss[loss=0.07664, simple_loss=0.09151, pruned_loss=0.01922, audio_tagging_loss=0.01167, over 16633.00 frames. ], tot_loss[loss=0.08282, simple_loss=0.1031, pruned_loss=0.02134, audio_tagging_loss=0.009914, over 3050059.14 frames. ], batch size: 60, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:13:54,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=862873.3333333334, ans=0.125 2023-11-20 00:13:58,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=862873.3333333334, ans=0.0 2023-11-20 00:14:01,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=862940.0, ans=0.125 2023-11-20 00:14:03,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=862940.0, ans=0.0 2023-11-20 00:14:06,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=862940.0, ans=0.0 2023-11-20 00:14:09,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=862940.0, ans=0.0 2023-11-20 00:14:11,531 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 129450 2023-11-20 00:14:27,354 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 8.355e+01 9.081e+01 9.853e+01 1.317e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-20 00:14:29,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=863073.3333333334, ans=0.04949747468305833 2023-11-20 00:14:53,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=863206.6666666666, ans=0.09899494936611666 2023-11-20 00:14:54,723 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 9250, loss[loss=0.1091, simple_loss=0.1401, pruned_loss=0.03093, audio_tagging_loss=0.008111, over 14342.00 frames. ], tot_loss[loss=0.08376, simple_loss=0.1043, pruned_loss=0.0217, audio_tagging_loss=0.009911, over 3054685.18 frames. ], batch size: 55, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:15:02,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=863206.6666666666, ans=0.2 2023-11-20 00:15:04,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=863206.6666666666, ans=0.125 2023-11-20 00:15:16,856 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 129500 2023-11-20 00:15:17,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=863273.3333333334, ans=0.5 2023-11-20 00:15:20,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=863340.0, ans=0.125 2023-11-20 00:15:42,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=863406.6666666666, ans=0.125 2023-11-20 00:15:50,530 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.87 vs. limit=6.0 2023-11-20 00:15:52,755 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.91 vs. limit=22.5 2023-11-20 00:15:59,591 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 9300, loss[loss=0.06862, simple_loss=0.08331, pruned_loss=0.01595, audio_tagging_loss=0.01102, over 15023.00 frames. ], tot_loss[loss=0.08363, simple_loss=0.1042, pruned_loss=0.02163, audio_tagging_loss=0.009894, over 3058542.11 frames. ], batch size: 57, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:16:04,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=863540.0, ans=0.2 2023-11-20 00:16:14,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=863606.6666666666, ans=0.04949747468305833 2023-11-20 00:16:15,337 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.28 vs. limit=15.0 2023-11-20 00:16:15,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=863606.6666666666, ans=0.125 2023-11-20 00:16:17,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=863606.6666666666, ans=0.125 2023-11-20 00:16:18,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=863606.6666666666, ans=0.0 2023-11-20 00:16:20,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=863606.6666666666, ans=0.05 2023-11-20 00:16:21,308 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 129550 2023-11-20 00:16:36,562 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.891e+01 8.083e+01 9.100e+01 9.829e+01 1.304e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-20 00:16:50,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=863806.6666666666, ans=0.0 2023-11-20 00:16:56,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=863806.6666666666, ans=0.025 2023-11-20 00:17:03,669 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 9350, loss[loss=0.108, simple_loss=0.1409, pruned_loss=0.02752, audio_tagging_loss=0.01004, over 15783.00 frames. ], tot_loss[loss=0.08378, simple_loss=0.1041, pruned_loss=0.02164, audio_tagging_loss=0.01007, over 3051714.79 frames. ], batch size: 57, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:17:12,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=863873.3333333334, ans=0.09899494936611666 2023-11-20 00:17:26,556 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 129600 2023-11-20 00:17:26,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=863940.0, ans=0.1 2023-11-20 00:17:57,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=864140.0, ans=0.125 2023-11-20 00:18:05,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=864140.0, ans=0.1 2023-11-20 00:18:09,201 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 9400, loss[loss=0.1042, simple_loss=0.1373, pruned_loss=0.0272, audio_tagging_loss=0.008384, over 15241.00 frames. ], tot_loss[loss=0.08378, simple_loss=0.1041, pruned_loss=0.02162, audio_tagging_loss=0.0101, over 3047291.79 frames. ], batch size: 55, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:18:09,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=864206.6666666666, ans=0.125 2023-11-20 00:18:32,221 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 129650 2023-11-20 00:18:34,155 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.71 vs. limit=15.0 2023-11-20 00:18:41,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=864340.0, ans=0.125 2023-11-20 00:18:46,740 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.773e+01 8.451e+01 9.430e+01 1.021e+02 1.598e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-20 00:18:54,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=864406.6666666666, ans=0.2 2023-11-20 00:19:03,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=864473.3333333334, ans=0.125 2023-11-20 00:19:04,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=864473.3333333334, ans=0.1 2023-11-20 00:19:14,289 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 9450, loss[loss=0.08559, simple_loss=0.09635, pruned_loss=0.02589, audio_tagging_loss=0.01153, over 15723.00 frames. ], tot_loss[loss=0.0832, simple_loss=0.103, pruned_loss=0.02146, audio_tagging_loss=0.01022, over 3046844.07 frames. ], batch size: 59, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:19:14,335 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:19:14,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=864540.0, ans=0.0 2023-11-20 00:19:16,342 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.49 vs. limit=10.0 2023-11-20 00:19:25,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=864540.0, ans=0.0 2023-11-20 00:19:32,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=864606.6666666666, ans=0.2 2023-11-20 00:19:35,887 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 129700 2023-11-20 00:20:18,803 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 9500, loss[loss=0.07639, simple_loss=0.08465, pruned_loss=0.02203, audio_tagging_loss=0.01204, over 14464.00 frames. ], tot_loss[loss=0.08334, simple_loss=0.103, pruned_loss=0.0215, audio_tagging_loss=0.01033, over 3039723.79 frames. ], batch size: 56, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:20:24,402 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.24 vs. limit=22.5 2023-11-20 00:20:33,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=864940.0, ans=0.035 2023-11-20 00:20:40,554 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 129750 2023-11-20 00:20:41,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=864940.0, ans=0.125 2023-11-20 00:20:57,184 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.873e+01 8.143e+01 9.051e+01 9.932e+01 1.802e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-20 00:21:05,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=865073.3333333334, ans=0.0 2023-11-20 00:21:17,839 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:21:23,869 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 9550, loss[loss=0.0937, simple_loss=0.1122, pruned_loss=0.02797, audio_tagging_loss=0.00963, over 15352.00 frames. ], tot_loss[loss=0.08391, simple_loss=0.1037, pruned_loss=0.02164, audio_tagging_loss=0.0104, over 3044647.34 frames. ], batch size: 57, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:21:25,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=865206.6666666666, ans=0.1 2023-11-20 00:21:32,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=865206.6666666666, ans=0.125 2023-11-20 00:21:35,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=865273.3333333334, ans=0.07 2023-11-20 00:21:44,629 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.74 vs. limit=12.0 2023-11-20 00:21:46,571 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 129800 2023-11-20 00:21:46,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=865273.3333333334, ans=0.0 2023-11-20 00:22:09,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=865406.6666666666, ans=0.0 2023-11-20 00:22:19,934 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=12.0 2023-11-20 00:22:20,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=865473.3333333334, ans=0.125 2023-11-20 00:22:23,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=865473.3333333334, ans=0.125 2023-11-20 00:22:29,122 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 9600, loss[loss=0.09475, simple_loss=0.1145, pruned_loss=0.0281, audio_tagging_loss=0.009384, over 15058.00 frames. ], tot_loss[loss=0.08406, simple_loss=0.1036, pruned_loss=0.02179, audio_tagging_loss=0.01047, over 3043864.89 frames. ], batch size: 59, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:22:50,664 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 129850 2023-11-20 00:22:53,583 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=15.0 2023-11-20 00:23:01,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=865673.3333333334, ans=0.2 2023-11-20 00:23:05,963 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.678e+01 8.225e+01 8.966e+01 9.703e+01 1.238e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-20 00:23:13,882 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:23:15,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=865740.0, ans=0.1 2023-11-20 00:23:21,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=865806.6666666666, ans=0.1 2023-11-20 00:23:33,789 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 9650, loss[loss=0.09474, simple_loss=0.1132, pruned_loss=0.02993, audio_tagging_loss=0.008215, over 15605.00 frames. ], tot_loss[loss=0.08382, simple_loss=0.1034, pruned_loss=0.02168, audio_tagging_loss=0.01044, over 3044033.80 frames. ], batch size: 61, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:23:41,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=865873.3333333334, ans=0.125 2023-11-20 00:23:54,789 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.30 vs. limit=15.0 2023-11-20 00:23:55,423 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 129900 2023-11-20 00:24:32,734 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.34 vs. limit=15.0 2023-11-20 00:24:37,651 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 9700, loss[loss=0.08119, simple_loss=0.09127, pruned_loss=0.02184, audio_tagging_loss=0.01372, over 15354.00 frames. ], tot_loss[loss=0.08463, simple_loss=0.1046, pruned_loss=0.02201, audio_tagging_loss=0.0103, over 3043695.89 frames. ], batch size: 57, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:24:47,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=866206.6666666666, ans=0.0 2023-11-20 00:24:48,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=866273.3333333334, ans=0.125 2023-11-20 00:24:51,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=866273.3333333334, ans=0.125 2023-11-20 00:24:59,557 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 129950 2023-11-20 00:25:10,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=866340.0, ans=0.0 2023-11-20 00:25:14,933 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.210e+01 8.274e+01 8.934e+01 1.009e+02 1.297e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 00:25:25,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=866406.6666666666, ans=0.125 2023-11-20 00:25:41,621 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 9750, loss[loss=0.06273, simple_loss=0.07127, pruned_loss=0.01345, audio_tagging_loss=0.01364, over 16221.00 frames. ], tot_loss[loss=0.0839, simple_loss=0.104, pruned_loss=0.02171, audio_tagging_loss=0.0102, over 3046862.29 frames. ], batch size: 65, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:25:46,596 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2023-11-20 00:25:47,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=866540.0, ans=0.125 2023-11-20 00:25:54,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=866606.6666666666, ans=0.2 2023-11-20 00:26:04,589 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 130000 2023-11-20 00:26:37,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=866806.6666666666, ans=0.1 2023-11-20 00:26:41,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=866806.6666666666, ans=0.125 2023-11-20 00:26:47,954 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 9800, loss[loss=0.07764, simple_loss=0.101, pruned_loss=0.01772, audio_tagging_loss=0.009429, over 15246.00 frames. ], tot_loss[loss=0.08413, simple_loss=0.1042, pruned_loss=0.02188, audio_tagging_loss=0.01012, over 3050695.77 frames. ], batch size: 59, lr: 6.21e-03, grad_scale: 16.0 2023-11-20 00:26:57,479 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.04 vs. limit=15.0 2023-11-20 00:27:09,803 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 130050 2023-11-20 00:27:26,363 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 8.368e+01 8.974e+01 9.697e+01 1.703e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-20 00:27:27,093 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.29 vs. limit=15.0 2023-11-20 00:27:30,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=867073.3333333334, ans=0.05 2023-11-20 00:27:32,864 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2023-11-20 00:27:37,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=867073.3333333334, ans=0.0 2023-11-20 00:27:44,278 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-20 00:27:47,316 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:27:52,253 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 9850, loss[loss=0.1126, simple_loss=0.1469, pruned_loss=0.03007, audio_tagging_loss=0.009042, over 14835.00 frames. ], tot_loss[loss=0.08418, simple_loss=0.1044, pruned_loss=0.02187, audio_tagging_loss=0.01009, over 3045033.72 frames. ], batch size: 55, lr: 6.21e-03, grad_scale: 16.0 2023-11-20 00:27:56,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=867206.6666666666, ans=0.125 2023-11-20 00:28:00,478 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:28:04,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=867273.3333333334, ans=0.1 2023-11-20 00:28:06,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=867273.3333333334, ans=0.1 2023-11-20 00:28:14,592 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 130100 2023-11-20 00:28:29,455 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:28:34,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=867406.6666666666, ans=0.125 2023-11-20 00:28:34,935 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.83 vs. limit=6.0 2023-11-20 00:28:38,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=867406.6666666666, ans=0.2 2023-11-20 00:28:47,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=867473.3333333334, ans=0.125 2023-11-20 00:28:57,220 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 9900, loss[loss=0.04477, simple_loss=0.05099, pruned_loss=0.007888, audio_tagging_loss=0.01139, over 14462.00 frames. ], tot_loss[loss=0.08346, simple_loss=0.1036, pruned_loss=0.02162, audio_tagging_loss=0.01003, over 3046324.79 frames. ], batch size: 60, lr: 6.20e-03, grad_scale: 16.0 2023-11-20 00:28:58,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=867540.0, ans=0.125 2023-11-20 00:29:04,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=867540.0, ans=0.09899494936611666 2023-11-20 00:29:20,365 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 130150 2023-11-20 00:29:36,776 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.726e+01 8.221e+01 8.937e+01 9.593e+01 1.338e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 00:30:02,317 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 9950, loss[loss=0.05943, simple_loss=0.07676, pruned_loss=0.01253, audio_tagging_loss=0.00852, over 15158.00 frames. ], tot_loss[loss=0.0834, simple_loss=0.1035, pruned_loss=0.02164, audio_tagging_loss=0.01003, over 3044658.36 frames. ], batch size: 56, lr: 6.20e-03, grad_scale: 16.0 2023-11-20 00:30:03,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=867873.3333333334, ans=0.125 2023-11-20 00:30:24,650 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 130200 2023-11-20 00:30:55,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=868140.0, ans=0.125 2023-11-20 00:30:59,077 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.57 vs. limit=10.0 2023-11-20 00:31:07,020 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 10000, loss[loss=0.08909, simple_loss=0.1106, pruned_loss=0.02181, audio_tagging_loss=0.012, over 14956.00 frames. ], tot_loss[loss=0.08407, simple_loss=0.1041, pruned_loss=0.02189, audio_tagging_loss=0.01012, over 3050301.44 frames. ], batch size: 56, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:31:11,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=868206.6666666666, ans=0.125 2023-11-20 00:31:28,543 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 130250 2023-11-20 00:31:31,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=868340.0, ans=0.2 2023-11-20 00:31:43,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=868340.0, ans=0.125 2023-11-20 00:31:45,674 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.042e+01 8.141e+01 8.733e+01 9.527e+01 1.222e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-20 00:32:11,986 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 10050, loss[loss=0.08735, simple_loss=0.111, pruned_loss=0.02236, audio_tagging_loss=0.00947, over 14307.00 frames. ], tot_loss[loss=0.08381, simple_loss=0.1038, pruned_loss=0.02178, audio_tagging_loss=0.01012, over 3043954.92 frames. ], batch size: 58, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:32:22,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=868540.0, ans=0.125 2023-11-20 00:32:25,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=868606.6666666666, ans=0.09899494936611666 2023-11-20 00:32:32,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=868606.6666666666, ans=0.0 2023-11-20 00:32:33,219 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.46 vs. limit=22.5 2023-11-20 00:32:33,896 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 130300 2023-11-20 00:33:07,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=868806.6666666666, ans=0.0 2023-11-20 00:33:13,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=868806.6666666666, ans=0.125 2023-11-20 00:33:17,458 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 10100, loss[loss=0.1143, simple_loss=0.1412, pruned_loss=0.03548, audio_tagging_loss=0.008193, over 15772.00 frames. ], tot_loss[loss=0.0839, simple_loss=0.1038, pruned_loss=0.02189, audio_tagging_loss=0.01012, over 3040662.91 frames. ], batch size: 56, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:33:25,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=868873.3333333334, ans=0.125 2023-11-20 00:33:34,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=868940.0, ans=0.1 2023-11-20 00:33:39,571 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 130350 2023-11-20 00:33:42,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=869006.6666666666, ans=0.0 2023-11-20 00:33:53,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=869006.6666666666, ans=0.5 2023-11-20 00:33:53,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=869006.6666666666, ans=0.125 2023-11-20 00:33:55,277 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.691e+01 8.143e+01 8.992e+01 9.764e+01 1.668e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-20 00:34:11,700 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:34:14,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=869140.0, ans=0.0 2023-11-20 00:34:21,580 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 10150, loss[loss=0.0933, simple_loss=0.1204, pruned_loss=0.02542, audio_tagging_loss=0.007701, over 16137.00 frames. ], tot_loss[loss=0.08402, simple_loss=0.1039, pruned_loss=0.02179, audio_tagging_loss=0.01028, over 3036819.82 frames. ], batch size: 58, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:34:26,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=869206.6666666666, ans=10.0 2023-11-20 00:34:29,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=869206.6666666666, ans=0.5 2023-11-20 00:34:29,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=869206.6666666666, ans=0.2 2023-11-20 00:34:35,571 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.74 vs. limit=15.0 2023-11-20 00:34:36,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=869273.3333333334, ans=0.0 2023-11-20 00:34:43,456 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 130400 2023-11-20 00:34:54,938 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:34:56,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=869340.0, ans=0.125 2023-11-20 00:35:03,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=869406.6666666666, ans=0.09899494936611666 2023-11-20 00:35:18,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.12 vs. limit=15.0 2023-11-20 00:35:23,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=869473.3333333334, ans=0.125 2023-11-20 00:35:27,269 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 10200, loss[loss=0.06064, simple_loss=0.06341, pruned_loss=0.01419, audio_tagging_loss=0.01475, over 14895.00 frames. ], tot_loss[loss=0.0844, simple_loss=0.104, pruned_loss=0.02206, audio_tagging_loss=0.01033, over 3043307.68 frames. ], batch size: 57, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:35:33,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=869540.0, ans=0.125 2023-11-20 00:35:34,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=869540.0, ans=10.0 2023-11-20 00:35:43,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=869606.6666666666, ans=0.0 2023-11-20 00:35:49,323 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 130450 2023-11-20 00:35:54,270 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:35:58,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=869673.3333333334, ans=0.05 2023-11-20 00:36:01,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=869673.3333333334, ans=0.2 2023-11-20 00:36:02,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=869673.3333333334, ans=0.05 2023-11-20 00:36:06,677 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.786e+01 8.250e+01 8.852e+01 1.003e+02 1.443e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-20 00:36:07,619 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.95 vs. limit=10.0 2023-11-20 00:36:18,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=869806.6666666666, ans=0.2 2023-11-20 00:36:32,644 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 10250, loss[loss=0.07607, simple_loss=0.09186, pruned_loss=0.01889, audio_tagging_loss=0.01125, over 14039.00 frames. ], tot_loss[loss=0.08525, simple_loss=0.1051, pruned_loss=0.02241, audio_tagging_loss=0.01027, over 3045893.35 frames. ], batch size: 57, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:36:50,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=869940.0, ans=0.2 2023-11-20 00:36:54,148 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 130500 2023-11-20 00:37:09,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=870073.3333333334, ans=0.0 2023-11-20 00:37:12,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=870073.3333333334, ans=0.04949747468305833 2023-11-20 00:37:17,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=870073.3333333334, ans=0.125 2023-11-20 00:37:26,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=870140.0, ans=0.0 2023-11-20 00:37:36,719 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 10300, loss[loss=0.09625, simple_loss=0.1346, pruned_loss=0.02206, audio_tagging_loss=0.006911, over 15735.00 frames. ], tot_loss[loss=0.08534, simple_loss=0.1056, pruned_loss=0.02239, audio_tagging_loss=0.01017, over 3049084.05 frames. ], batch size: 58, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:37:55,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=870273.3333333334, ans=0.125 2023-11-20 00:37:59,505 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 130550 2023-11-20 00:38:16,962 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.312e+01 8.334e+01 9.071e+01 9.729e+01 1.396e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-20 00:38:32,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=870473.3333333334, ans=0.0 2023-11-20 00:38:41,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=870540.0, ans=0.0 2023-11-20 00:38:42,754 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 10350, loss[loss=0.07506, simple_loss=0.08572, pruned_loss=0.02031, audio_tagging_loss=0.01189, over 13945.00 frames. ], tot_loss[loss=0.08407, simple_loss=0.1036, pruned_loss=0.02201, audio_tagging_loss=0.01029, over 3046841.32 frames. ], batch size: 57, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:39:05,161 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 130600 2023-11-20 00:39:13,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=870673.3333333334, ans=0.125 2023-11-20 00:39:47,659 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 10400, loss[loss=0.08261, simple_loss=0.1014, pruned_loss=0.01989, audio_tagging_loss=0.012, over 15482.00 frames. ], tot_loss[loss=0.08437, simple_loss=0.1039, pruned_loss=0.02205, audio_tagging_loss=0.01037, over 3048160.07 frames. ], batch size: 58, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:40:01,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=870940.0, ans=0.95 2023-11-20 00:40:02,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=870940.0, ans=0.0 2023-11-20 00:40:09,377 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 130650 2023-11-20 00:40:15,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=871006.6666666666, ans=0.0 2023-11-20 00:40:16,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=871006.6666666666, ans=0.0 2023-11-20 00:40:17,728 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2023-11-20 00:40:26,387 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.869e+01 8.319e+01 9.012e+01 9.651e+01 1.388e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-20 00:40:29,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=871073.3333333334, ans=0.125 2023-11-20 00:40:30,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=871073.3333333334, ans=0.125 2023-11-20 00:40:52,081 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 10450, loss[loss=0.07221, simple_loss=0.09049, pruned_loss=0.01492, audio_tagging_loss=0.01204, over 14672.00 frames. ], tot_loss[loss=0.08363, simple_loss=0.1029, pruned_loss=0.02176, audio_tagging_loss=0.01043, over 3050169.87 frames. ], batch size: 54, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:41:01,981 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.72 vs. limit=22.5 2023-11-20 00:41:13,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=871273.3333333334, ans=0.0 2023-11-20 00:41:14,158 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 130700 2023-11-20 00:41:35,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=871406.6666666666, ans=0.125 2023-11-20 00:41:38,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=871406.6666666666, ans=0.125 2023-11-20 00:41:56,751 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 10500, loss[loss=0.1137, simple_loss=0.1405, pruned_loss=0.03603, audio_tagging_loss=0.007452, over 16138.00 frames. ], tot_loss[loss=0.0837, simple_loss=0.103, pruned_loss=0.02188, audio_tagging_loss=0.01031, over 3052092.24 frames. ], batch size: 57, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:41:59,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=871540.0, ans=0.1 2023-11-20 00:42:02,601 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:42:05,368 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.02 vs. limit=15.0 2023-11-20 00:42:11,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=871606.6666666666, ans=0.1 2023-11-20 00:42:18,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=871606.6666666666, ans=0.125 2023-11-20 00:42:19,542 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 130750 2023-11-20 00:42:31,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=871673.3333333334, ans=0.125 2023-11-20 00:42:35,578 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.944e+01 8.359e+01 9.035e+01 1.000e+02 1.181e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-20 00:42:45,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=871740.0, ans=0.125 2023-11-20 00:42:45,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=871740.0, ans=0.125 2023-11-20 00:42:48,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=871806.6666666666, ans=0.125 2023-11-20 00:43:01,871 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 10550, loss[loss=0.07409, simple_loss=0.089, pruned_loss=0.01825, audio_tagging_loss=0.01134, over 15976.00 frames. ], tot_loss[loss=0.08372, simple_loss=0.1034, pruned_loss=0.02185, audio_tagging_loss=0.01016, over 3051612.20 frames. ], batch size: 61, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:43:04,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=871873.3333333334, ans=0.2 2023-11-20 00:43:23,348 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 130800 2023-11-20 00:43:44,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=872073.3333333334, ans=0.1 2023-11-20 00:43:47,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=872073.3333333334, ans=0.1 2023-11-20 00:43:56,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=872140.0, ans=0.0 2023-11-20 00:44:06,380 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 10600, loss[loss=0.09025, simple_loss=0.1049, pruned_loss=0.02154, audio_tagging_loss=0.01624, over 14835.00 frames. ], tot_loss[loss=0.08396, simple_loss=0.1038, pruned_loss=0.02184, audio_tagging_loss=0.01021, over 3049419.59 frames. ], batch size: 58, lr: 6.19e-03, grad_scale: 16.0 2023-11-20 00:44:10,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=872206.6666666666, ans=0.125 2023-11-20 00:44:12,152 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.83 vs. limit=15.0 2023-11-20 00:44:22,398 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.96 vs. limit=15.0 2023-11-20 00:44:27,745 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 130850 2023-11-20 00:44:27,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=872273.3333333334, ans=0.1 2023-11-20 00:44:46,875 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.153e+01 8.252e+01 9.029e+01 9.863e+01 1.267e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-20 00:44:47,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=872406.6666666666, ans=0.125 2023-11-20 00:45:10,263 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=12.0 2023-11-20 00:45:10,726 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 10650, loss[loss=0.1022, simple_loss=0.1241, pruned_loss=0.03127, audio_tagging_loss=0.008901, over 15635.00 frames. ], tot_loss[loss=0.08296, simple_loss=0.1027, pruned_loss=0.02146, audio_tagging_loss=0.01016, over 3049280.54 frames. ], batch size: 55, lr: 6.19e-03, grad_scale: 16.0 2023-11-20 00:45:32,912 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 130900 2023-11-20 00:45:34,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=872606.6666666666, ans=0.0 2023-11-20 00:45:46,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2023-11-20 00:46:02,490 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.31 vs. limit=15.0 2023-11-20 00:46:14,909 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 10700, loss[loss=0.1158, simple_loss=0.1412, pruned_loss=0.03811, audio_tagging_loss=0.007094, over 14978.00 frames. ], tot_loss[loss=0.08326, simple_loss=0.1032, pruned_loss=0.02158, audio_tagging_loss=0.01008, over 3040927.39 frames. ], batch size: 58, lr: 6.19e-03, grad_scale: 16.0 2023-11-20 00:46:37,377 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 130950 2023-11-20 00:46:44,168 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2023-11-20 00:46:55,067 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.497e+01 8.315e+01 9.053e+01 9.865e+01 1.273e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-20 00:47:20,605 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 10750, loss[loss=0.08279, simple_loss=0.1045, pruned_loss=0.02002, audio_tagging_loss=0.01051, over 15159.00 frames. ], tot_loss[loss=0.0836, simple_loss=0.1036, pruned_loss=0.02168, audio_tagging_loss=0.01013, over 3050745.34 frames. ], batch size: 59, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:47:23,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=873206.6666666666, ans=0.1 2023-11-20 00:47:40,236 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-11-20 00:47:41,956 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 131000 2023-11-20 00:48:24,193 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 10800, loss[loss=0.08404, simple_loss=0.1062, pruned_loss=0.0191, audio_tagging_loss=0.01185, over 15738.00 frames. ], tot_loss[loss=0.08345, simple_loss=0.1037, pruned_loss=0.02151, audio_tagging_loss=0.01008, over 3055355.92 frames. ], batch size: 58, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:48:37,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=873606.6666666666, ans=0.125 2023-11-20 00:48:45,303 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=12.0 2023-11-20 00:48:46,049 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 131050 2023-11-20 00:49:05,230 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.697e+01 8.209e+01 8.933e+01 9.655e+01 1.364e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 00:49:07,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=873740.0, ans=0.0 2023-11-20 00:49:14,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=873806.6666666666, ans=0.125 2023-11-20 00:49:27,987 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 10850, loss[loss=0.1098, simple_loss=0.1381, pruned_loss=0.03188, audio_tagging_loss=0.008935, over 15363.00 frames. ], tot_loss[loss=0.08348, simple_loss=0.1033, pruned_loss=0.02164, audio_tagging_loss=0.01019, over 3057509.84 frames. ], batch size: 57, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:49:33,970 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.44 vs. limit=15.0 2023-11-20 00:49:50,573 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 131100 2023-11-20 00:49:54,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=874006.6666666666, ans=0.1 2023-11-20 00:49:54,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=874006.6666666666, ans=0.0 2023-11-20 00:49:57,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=874006.6666666666, ans=0.125 2023-11-20 00:50:18,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=874140.0, ans=0.0 2023-11-20 00:50:18,746 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.22 vs. limit=15.0 2023-11-20 00:50:27,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=874140.0, ans=0.0 2023-11-20 00:50:30,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=874140.0, ans=0.125 2023-11-20 00:50:32,924 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 10900, loss[loss=0.05275, simple_loss=0.05345, pruned_loss=0.0118, audio_tagging_loss=0.01423, over 15158.00 frames. ], tot_loss[loss=0.08366, simple_loss=0.1034, pruned_loss=0.02165, audio_tagging_loss=0.0103, over 3055247.75 frames. ], batch size: 59, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:50:32,960 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:50:48,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=874273.3333333334, ans=0.125 2023-11-20 00:50:49,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=874273.3333333334, ans=0.125 2023-11-20 00:50:54,860 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 131150 2023-11-20 00:50:57,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=874340.0, ans=0.1 2023-11-20 00:51:13,346 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.251e+01 8.753e+01 9.767e+01 1.236e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-20 00:51:18,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=874406.6666666666, ans=0.125 2023-11-20 00:51:36,264 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 10950, loss[loss=0.07934, simple_loss=0.09125, pruned_loss=0.02139, audio_tagging_loss=0.01233, over 15628.00 frames. ], tot_loss[loss=0.08366, simple_loss=0.1036, pruned_loss=0.02157, audio_tagging_loss=0.01027, over 3053799.46 frames. ], batch size: 61, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:51:50,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=874606.6666666666, ans=0.2 2023-11-20 00:51:54,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=874606.6666666666, ans=0.0 2023-11-20 00:51:58,458 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 131200 2023-11-20 00:52:01,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=874673.3333333334, ans=0.1 2023-11-20 00:52:09,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=874673.3333333334, ans=0.0 2023-11-20 00:52:34,603 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.48 vs. limit=22.5 2023-11-20 00:52:41,622 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 11000, loss[loss=0.07837, simple_loss=0.09178, pruned_loss=0.02317, audio_tagging_loss=0.009311, over 15537.00 frames. ], tot_loss[loss=0.08355, simple_loss=0.1035, pruned_loss=0.02147, audio_tagging_loss=0.01031, over 3047962.59 frames. ], batch size: 60, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:52:44,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=874873.3333333334, ans=0.2 2023-11-20 00:52:56,466 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:52:59,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=874940.0, ans=0.0 2023-11-20 00:53:04,417 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 131250 2023-11-20 00:53:04,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=874940.0, ans=0.125 2023-11-20 00:53:18,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=875006.6666666666, ans=0.0 2023-11-20 00:53:20,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=875073.3333333334, ans=0.125 2023-11-20 00:53:23,523 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.815e+01 8.049e+01 8.869e+01 9.421e+01 1.178e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 00:53:25,273 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.78 vs. limit=15.0 2023-11-20 00:53:25,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=875073.3333333334, ans=0.125 2023-11-20 00:53:29,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=875073.3333333334, ans=0.0 2023-11-20 00:53:33,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=875140.0, ans=0.125 2023-11-20 00:53:46,585 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 11050, loss[loss=0.07259, simple_loss=0.07715, pruned_loss=0.01994, audio_tagging_loss=0.01407, over 14757.00 frames. ], tot_loss[loss=0.08345, simple_loss=0.1033, pruned_loss=0.02151, audio_tagging_loss=0.0103, over 3042614.52 frames. ], batch size: 58, lr: 6.18e-03, grad_scale: 8.0 2023-11-20 00:53:54,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=875206.6666666666, ans=0.125 2023-11-20 00:54:08,900 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 131300 2023-11-20 00:54:25,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=875406.6666666666, ans=0.09899494936611666 2023-11-20 00:54:28,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=875406.6666666666, ans=0.95 2023-11-20 00:54:32,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=875406.6666666666, ans=0.0 2023-11-20 00:54:50,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=875540.0, ans=0.2 2023-11-20 00:54:51,017 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 11100, loss[loss=0.1052, simple_loss=0.127, pruned_loss=0.03055, audio_tagging_loss=0.01115, over 14711.00 frames. ], tot_loss[loss=0.0842, simple_loss=0.104, pruned_loss=0.02184, audio_tagging_loss=0.01036, over 3044397.47 frames. ], batch size: 54, lr: 6.18e-03, grad_scale: 8.0 2023-11-20 00:54:55,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=875540.0, ans=0.0 2023-11-20 00:55:12,449 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 131350 2023-11-20 00:55:27,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=875673.3333333334, ans=0.0 2023-11-20 00:55:32,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=875740.0, ans=0.125 2023-11-20 00:55:33,849 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.519e+01 8.391e+01 9.002e+01 9.833e+01 1.655e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-20 00:55:45,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=875806.6666666666, ans=0.0 2023-11-20 00:55:48,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=875806.6666666666, ans=0.1 2023-11-20 00:55:55,293 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 11150, loss[loss=0.1049, simple_loss=0.1284, pruned_loss=0.03072, audio_tagging_loss=0.009948, over 14518.00 frames. ], tot_loss[loss=0.08421, simple_loss=0.1037, pruned_loss=0.02189, audio_tagging_loss=0.01049, over 3047462.83 frames. ], batch size: 53, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 00:56:17,346 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 131400 2023-11-20 00:56:17,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=875940.0, ans=0.04949747468305833 2023-11-20 00:56:42,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=876073.3333333334, ans=0.0 2023-11-20 00:56:45,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=876073.3333333334, ans=0.0 2023-11-20 00:57:00,006 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 11200, loss[loss=0.07156, simple_loss=0.08558, pruned_loss=0.01909, audio_tagging_loss=0.009681, over 14563.00 frames. ], tot_loss[loss=0.08444, simple_loss=0.1041, pruned_loss=0.02185, audio_tagging_loss=0.01055, over 3045280.25 frames. ], batch size: 54, lr: 6.17e-03, grad_scale: 16.0 2023-11-20 00:57:22,030 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 131450 2023-11-20 00:57:36,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.79 vs. limit=8.0 2023-11-20 00:57:42,717 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.062e+01 8.697e+01 9.573e+01 1.140e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 00:58:04,924 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 11250, loss[loss=0.08427, simple_loss=0.107, pruned_loss=0.0227, audio_tagging_loss=0.008049, over 16210.00 frames. ], tot_loss[loss=0.0833, simple_loss=0.1025, pruned_loss=0.0214, audio_tagging_loss=0.01064, over 3051936.76 frames. ], batch size: 59, lr: 6.17e-03, grad_scale: 16.0 2023-11-20 00:58:13,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=876540.0, ans=0.125 2023-11-20 00:58:14,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=876540.0, ans=0.0 2023-11-20 00:58:26,633 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 131500 2023-11-20 00:58:29,566 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2023-11-20 00:58:45,749 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2023-11-20 00:58:51,597 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-11-20 00:59:08,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=876873.3333333334, ans=0.0 2023-11-20 00:59:09,620 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 11300, loss[loss=0.08556, simple_loss=0.107, pruned_loss=0.02357, audio_tagging_loss=0.008507, over 14852.00 frames. ], tot_loss[loss=0.0825, simple_loss=0.102, pruned_loss=0.02116, audio_tagging_loss=0.01036, over 3049225.72 frames. ], batch size: 56, lr: 6.17e-03, grad_scale: 16.0 2023-11-20 00:59:31,656 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 131550 2023-11-20 00:59:34,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=877006.6666666666, ans=0.04949747468305833 2023-11-20 00:59:52,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=877073.3333333334, ans=0.05 2023-11-20 00:59:53,569 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.175e+01 8.098e+01 8.659e+01 9.613e+01 1.705e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-20 00:59:55,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=877073.3333333334, ans=0.0 2023-11-20 01:00:14,029 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 11350, loss[loss=0.06, simple_loss=0.06909, pruned_loss=0.01434, audio_tagging_loss=0.01111, over 15465.00 frames. ], tot_loss[loss=0.08255, simple_loss=0.1022, pruned_loss=0.02127, audio_tagging_loss=0.0102, over 3045182.43 frames. ], batch size: 61, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 01:00:24,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=877206.6666666666, ans=0.125 2023-11-20 01:00:32,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=877273.3333333334, ans=0.125 2023-11-20 01:00:35,822 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 131600 2023-11-20 01:00:41,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=877340.0, ans=0.0 2023-11-20 01:00:54,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=877406.6666666666, ans=0.0 2023-11-20 01:01:02,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=877406.6666666666, ans=0.125 2023-11-20 01:01:10,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=877473.3333333334, ans=0.125 2023-11-20 01:01:18,982 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 11400, loss[loss=0.1008, simple_loss=0.1231, pruned_loss=0.02938, audio_tagging_loss=0.009938, over 15422.00 frames. ], tot_loss[loss=0.0834, simple_loss=0.1034, pruned_loss=0.02158, audio_tagging_loss=0.01011, over 3042415.69 frames. ], batch size: 58, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 01:01:23,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=877540.0, ans=0.2 2023-11-20 01:01:40,763 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 131650 2023-11-20 01:01:54,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=877673.3333333334, ans=0.0 2023-11-20 01:02:02,814 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.241e+01 9.056e+01 1.011e+02 3.989e+02, threshold=1.811e+02, percent-clipped=1.0 2023-11-20 01:02:15,616 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2023-11-20 01:02:23,685 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 11450, loss[loss=0.1133, simple_loss=0.1452, pruned_loss=0.03342, audio_tagging_loss=0.007335, over 15378.00 frames. ], tot_loss[loss=0.0843, simple_loss=0.1046, pruned_loss=0.02188, audio_tagging_loss=0.01011, over 3041033.54 frames. ], batch size: 53, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 01:02:33,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=877873.3333333334, ans=0.1 2023-11-20 01:02:45,833 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 131700 2023-11-20 01:03:08,039 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.70 vs. limit=22.5 2023-11-20 01:03:16,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=878140.0, ans=0.0 2023-11-20 01:03:21,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=878140.0, ans=0.125 2023-11-20 01:03:27,756 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 11500, loss[loss=0.1115, simple_loss=0.1361, pruned_loss=0.0346, audio_tagging_loss=0.00883, over 15046.00 frames. ], tot_loss[loss=0.0834, simple_loss=0.1034, pruned_loss=0.02154, audio_tagging_loss=0.01018, over 3041771.41 frames. ], batch size: 54, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 01:03:49,217 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 131750 2023-11-20 01:04:10,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=878406.6666666666, ans=10.0 2023-11-20 01:04:11,077 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.409e+01 8.478e+01 9.308e+01 1.005e+02 1.725e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-20 01:04:31,792 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 11550, loss[loss=0.09046, simple_loss=0.1052, pruned_loss=0.02231, audio_tagging_loss=0.01553, over 15281.00 frames. ], tot_loss[loss=0.08309, simple_loss=0.1028, pruned_loss=0.02144, audio_tagging_loss=0.01024, over 3039837.24 frames. ], batch size: 57, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 01:04:34,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=878540.0, ans=0.125 2023-11-20 01:04:34,836 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-11-20 01:04:51,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=878606.6666666666, ans=0.125 2023-11-20 01:04:54,043 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 131800 2023-11-20 01:04:57,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=878673.3333333334, ans=0.125 2023-11-20 01:05:04,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=878673.3333333334, ans=0.125 2023-11-20 01:05:15,158 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 01:05:26,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2023-11-20 01:05:36,269 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 11600, loss[loss=0.08945, simple_loss=0.09832, pruned_loss=0.02403, audio_tagging_loss=0.01626, over 17330.00 frames. ], tot_loss[loss=0.0828, simple_loss=0.1024, pruned_loss=0.02134, audio_tagging_loss=0.01028, over 3040161.97 frames. ], batch size: 65, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:05:37,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=878873.3333333334, ans=0.2 2023-11-20 01:05:41,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=878873.3333333334, ans=0.2 2023-11-20 01:05:58,512 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 131850 2023-11-20 01:06:00,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=878940.0, ans=0.125 2023-11-20 01:06:06,435 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:06:19,902 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.853e+01 8.090e+01 8.633e+01 9.438e+01 1.426e+02, threshold=1.727e+02, percent-clipped=0.0 2023-11-20 01:06:27,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=879140.0, ans=0.2 2023-11-20 01:06:34,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=879140.0, ans=0.125 2023-11-20 01:06:40,887 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 11650, loss[loss=0.08289, simple_loss=0.1124, pruned_loss=0.01994, audio_tagging_loss=0.006748, over 14936.00 frames. ], tot_loss[loss=0.0828, simple_loss=0.1024, pruned_loss=0.02132, audio_tagging_loss=0.01029, over 3042131.07 frames. ], batch size: 56, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:06:53,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=879273.3333333334, ans=0.125 2023-11-20 01:07:02,992 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 131900 2023-11-20 01:07:06,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=879340.0, ans=0.1 2023-11-20 01:07:09,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=879340.0, ans=0.2 2023-11-20 01:07:09,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=879340.0, ans=0.125 2023-11-20 01:07:44,210 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-11-20 01:07:45,866 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 11700, loss[loss=0.07422, simple_loss=0.09409, pruned_loss=0.01779, audio_tagging_loss=0.009382, over 15938.00 frames. ], tot_loss[loss=0.08206, simple_loss=0.1012, pruned_loss=0.02105, audio_tagging_loss=0.01039, over 3044943.83 frames. ], batch size: 57, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:07:54,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.26 vs. limit=22.5 2023-11-20 01:08:03,884 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:08:05,558 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.69 vs. limit=22.5 2023-11-20 01:08:07,445 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 131950 2023-11-20 01:08:13,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=879673.3333333334, ans=0.0 2023-11-20 01:08:29,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=879740.0, ans=0.125 2023-11-20 01:08:30,046 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.705e+01 8.385e+01 9.019e+01 1.002e+02 1.324e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-20 01:08:37,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=879806.6666666666, ans=0.2 2023-11-20 01:08:41,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=879806.6666666666, ans=0.0 2023-11-20 01:08:49,739 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 11750, loss[loss=0.09231, simple_loss=0.1149, pruned_loss=0.02535, audio_tagging_loss=0.009522, over 16487.00 frames. ], tot_loss[loss=0.08255, simple_loss=0.1016, pruned_loss=0.02131, audio_tagging_loss=0.01045, over 3045648.43 frames. ], batch size: 58, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:09:12,302 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 132000 2023-11-20 01:09:12,990 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2023-11-20 01:09:21,569 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=12.0 2023-11-20 01:09:23,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=880006.6666666666, ans=0.0 2023-11-20 01:09:25,582 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.78 vs. limit=22.5 2023-11-20 01:09:39,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=880073.3333333334, ans=0.04949747468305833 2023-11-20 01:09:46,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=880140.0, ans=0.125 2023-11-20 01:09:58,253 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 11800, loss[loss=0.08858, simple_loss=0.1071, pruned_loss=0.02404, audio_tagging_loss=0.01098, over 15409.00 frames. ], tot_loss[loss=0.08203, simple_loss=0.1005, pruned_loss=0.02115, audio_tagging_loss=0.01061, over 3042922.79 frames. ], batch size: 58, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:09:59,131 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2023-11-20 01:10:15,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=880273.3333333334, ans=0.125 2023-11-20 01:10:20,406 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 132050 2023-11-20 01:10:41,790 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 8.587e+01 9.267e+01 9.920e+01 1.513e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-20 01:10:42,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=880406.6666666666, ans=0.0 2023-11-20 01:10:42,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=880406.6666666666, ans=0.125 2023-11-20 01:10:43,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=880406.6666666666, ans=0.025 2023-11-20 01:10:56,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=880473.3333333334, ans=0.125 2023-11-20 01:11:03,512 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 11850, loss[loss=0.07949, simple_loss=0.0947, pruned_loss=0.01922, audio_tagging_loss=0.01292, over 15580.00 frames. ], tot_loss[loss=0.08238, simple_loss=0.1012, pruned_loss=0.02121, audio_tagging_loss=0.01054, over 3045481.25 frames. ], batch size: 60, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:11:24,957 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 132100 2023-11-20 01:11:43,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=880740.0, ans=0.125 2023-11-20 01:11:54,758 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.725e-03 2023-11-20 01:11:55,963 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:12:04,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=880806.6666666666, ans=0.95 2023-11-20 01:12:05,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=880873.3333333334, ans=0.125 2023-11-20 01:12:06,493 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 11900, loss[loss=0.1033, simple_loss=0.1217, pruned_loss=0.0319, audio_tagging_loss=0.01052, over 14903.00 frames. ], tot_loss[loss=0.0828, simple_loss=0.1019, pruned_loss=0.02124, audio_tagging_loss=0.01058, over 3047171.60 frames. ], batch size: 58, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:12:12,820 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.12 vs. limit=15.0 2023-11-20 01:12:26,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=880940.0, ans=0.0 2023-11-20 01:12:28,376 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 132150 2023-11-20 01:12:28,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=880940.0, ans=0.125 2023-11-20 01:12:41,318 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.13 vs. limit=22.5 2023-11-20 01:12:43,350 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2023-11-20 01:12:50,267 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.333e+01 8.992e+01 9.854e+01 1.328e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-20 01:12:52,226 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2023-11-20 01:13:01,428 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.59 vs. limit=6.0 2023-11-20 01:13:10,637 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 11950, loss[loss=0.1097, simple_loss=0.1382, pruned_loss=0.0331, audio_tagging_loss=0.007521, over 15863.00 frames. ], tot_loss[loss=0.0831, simple_loss=0.1022, pruned_loss=0.02139, audio_tagging_loss=0.01062, over 3050417.41 frames. ], batch size: 57, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:13:24,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=881273.3333333334, ans=0.125 2023-11-20 01:13:33,296 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 132200 2023-11-20 01:13:33,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=881273.3333333334, ans=0.1 2023-11-20 01:13:38,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=881340.0, ans=0.1 2023-11-20 01:13:59,097 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.17 vs. limit=15.0 2023-11-20 01:14:05,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=881473.3333333334, ans=0.0 2023-11-20 01:14:06,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=881473.3333333334, ans=0.0 2023-11-20 01:14:12,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=881540.0, ans=0.2 2023-11-20 01:14:13,604 INFO [train_asr.py:1262] (3/4) Epoch 11, batch 12000, loss[loss=0.1123, simple_loss=0.1464, pruned_loss=0.03086, audio_tagging_loss=0.008265, over 15573.00 frames. ], tot_loss[loss=0.08329, simple_loss=0.1025, pruned_loss=0.02136, audio_tagging_loss=0.01067, over 3049388.72 frames. ], batch size: 57, lr: 6.15e-03, grad_scale: 32.0 2023-11-20 01:14:13,605 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-20 01:14:57,665 INFO [train_asr.py:1294] (3/4) Epoch 11, validation: loss=0.06362, simple_loss=0.05468, pruned_loss=0.006127, audio_tagging_loss=0.03015, over 4681554.00 frames. 2023-11-20 01:14:57,666 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-20 01:15:17,678 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 132250 2023-11-20 01:15:18,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=881606.6666666666, ans=0.125 2023-11-20 01:15:27,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=881673.3333333334, ans=0.05 2023-11-20 01:16:05,093 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 0, loss[loss=0.1065, simple_loss=0.1204, pruned_loss=0.0253, audio_tagging_loss=0.02099, over 15082.00 frames. ], tot_loss[loss=0.1065, simple_loss=0.1204, pruned_loss=0.0253, audio_tagging_loss=0.02099, over 15082.00 frames. ], batch size: 58, lr: 5.90e-03, grad_scale: 32.0 2023-11-20 01:16:05,094 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-20 01:16:22,873 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.5700, 3.0779, 3.1224, 2.9165], device='cuda:3') 2023-11-20 01:16:34,406 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.7842, 0.3020, 3.2392, 3.0022, 2.4805, 2.8962, 2.9757, 3.1274], device='cuda:3') 2023-11-20 01:16:42,299 INFO [train_asr.py:1294] (3/4) Epoch 12, validation: loss=0.06246, simple_loss=0.05467, pruned_loss=0.006079, audio_tagging_loss=0.02904, over 4681554.00 frames. 2023-11-20 01:16:42,300 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-20 01:16:42,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=881720.0, ans=0.09899494936611666 2023-11-20 01:16:51,495 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.342e+01 8.202e+01 8.941e+01 9.888e+01 1.289e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 01:16:55,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=881786.6666666666, ans=0.5 2023-11-20 01:17:16,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=881853.3333333334, ans=0.2 2023-11-20 01:17:22,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=881920.0, ans=0.125 2023-11-20 01:17:31,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=881920.0, ans=0.125 2023-11-20 01:17:34,460 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 132300 2023-11-20 01:17:41,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=881986.6666666666, ans=0.125 2023-11-20 01:17:43,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=881986.6666666666, ans=0.125 2023-11-20 01:17:47,213 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 50, loss[loss=0.08475, simple_loss=0.1001, pruned_loss=0.01789, audio_tagging_loss=0.0168, over 16488.00 frames. ], tot_loss[loss=0.09074, simple_loss=0.1005, pruned_loss=0.02056, audio_tagging_loss=0.01995, over 688367.92 frames. ], batch size: 62, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:17:47,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=882053.3333333334, ans=0.0 2023-11-20 01:18:30,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=882253.3333333334, ans=0.02 2023-11-20 01:18:39,758 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 132350 2023-11-20 01:18:40,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=882320.0, ans=0.95 2023-11-20 01:18:47,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=882320.0, ans=0.0 2023-11-20 01:18:52,567 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 100, loss[loss=0.09549, simple_loss=0.1123, pruned_loss=0.02208, audio_tagging_loss=0.01728, over 15629.00 frames. ], tot_loss[loss=0.09207, simple_loss=0.1032, pruned_loss=0.02146, audio_tagging_loss=0.01899, over 1220613.39 frames. ], batch size: 57, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:18:54,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=882386.6666666666, ans=0.125 2023-11-20 01:19:01,652 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.733e+01 8.794e+01 9.349e+01 1.020e+02 1.692e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-20 01:19:28,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=882520.0, ans=0.2 2023-11-20 01:19:41,228 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=22.5 2023-11-20 01:19:44,808 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 132400 2023-11-20 01:19:57,296 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 150, loss[loss=0.0996, simple_loss=0.1089, pruned_loss=0.02856, audio_tagging_loss=0.01662, over 15135.00 frames. ], tot_loss[loss=0.0901, simple_loss=0.1033, pruned_loss=0.02136, audio_tagging_loss=0.01709, over 1622833.98 frames. ], batch size: 57, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:20:18,835 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2023-11-20 01:20:27,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=882853.3333333334, ans=0.125 2023-11-20 01:20:39,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=882920.0, ans=0.07 2023-11-20 01:20:49,266 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 132450 2023-11-20 01:20:53,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=882986.6666666666, ans=0.125 2023-11-20 01:21:02,279 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 200, loss[loss=0.1072, simple_loss=0.1413, pruned_loss=0.02652, audio_tagging_loss=0.01003, over 15991.00 frames. ], tot_loss[loss=0.08864, simple_loss=0.1039, pruned_loss=0.02168, audio_tagging_loss=0.01502, over 1930326.56 frames. ], batch size: 57, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:21:11,566 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.201e+01 8.761e+01 9.540e+01 1.328e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-20 01:21:32,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=883186.6666666666, ans=0.2 2023-11-20 01:21:48,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=883253.3333333334, ans=0.0 2023-11-20 01:21:50,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=883253.3333333334, ans=0.125 2023-11-20 01:21:54,062 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 132500 2023-11-20 01:22:06,769 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 250, loss[loss=0.08581, simple_loss=0.1078, pruned_loss=0.02181, audio_tagging_loss=0.01009, over 15087.00 frames. ], tot_loss[loss=0.08651, simple_loss=0.1033, pruned_loss=0.02139, audio_tagging_loss=0.0135, over 2183253.45 frames. ], batch size: 56, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:22:26,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=883453.3333333334, ans=0.125 2023-11-20 01:22:49,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=883586.6666666666, ans=0.125 2023-11-20 01:22:53,654 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=15.0 2023-11-20 01:22:54,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=883586.6666666666, ans=0.125 2023-11-20 01:22:58,731 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 132550 2023-11-20 01:23:11,514 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 300, loss[loss=0.06358, simple_loss=0.08336, pruned_loss=0.01367, audio_tagging_loss=0.008223, over 15978.00 frames. ], tot_loss[loss=0.08533, simple_loss=0.1032, pruned_loss=0.02113, audio_tagging_loss=0.01259, over 2378445.68 frames. ], batch size: 61, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:23:20,190 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.079e+01 8.204e+01 9.028e+01 9.850e+01 1.789e+02, threshold=1.806e+02, percent-clipped=1.0 2023-11-20 01:23:20,887 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2023-11-20 01:23:37,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=883853.3333333334, ans=0.5 2023-11-20 01:23:46,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=883853.3333333334, ans=0.125 2023-11-20 01:24:03,120 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 132600 2023-11-20 01:24:11,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2023-11-20 01:24:14,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=883986.6666666666, ans=0.0 2023-11-20 01:24:16,353 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 350, loss[loss=0.06831, simple_loss=0.07272, pruned_loss=0.01791, audio_tagging_loss=0.01404, over 15696.00 frames. ], tot_loss[loss=0.08575, simple_loss=0.1042, pruned_loss=0.02168, audio_tagging_loss=0.01195, over 2534260.54 frames. ], batch size: 60, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:24:57,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=884253.3333333334, ans=0.0 2023-11-20 01:24:59,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=884253.3333333334, ans=0.125 2023-11-20 01:25:00,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=884253.3333333334, ans=0.125 2023-11-20 01:25:08,214 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 132650 2023-11-20 01:25:21,016 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 400, loss[loss=0.05244, simple_loss=0.06148, pruned_loss=0.009639, audio_tagging_loss=0.01206, over 14913.00 frames. ], tot_loss[loss=0.08509, simple_loss=0.1041, pruned_loss=0.02163, audio_tagging_loss=0.01142, over 2652508.78 frames. ], batch size: 58, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:25:30,272 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.352e+01 8.151e+01 8.736e+01 9.522e+01 1.340e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-20 01:25:54,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=884520.0, ans=0.125 2023-11-20 01:26:04,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=884586.6666666666, ans=0.125 2023-11-20 01:26:13,042 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 132700 2023-11-20 01:26:16,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=884653.3333333334, ans=0.0 2023-11-20 01:26:26,626 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 450, loss[loss=0.07354, simple_loss=0.09462, pruned_loss=0.01606, audio_tagging_loss=0.01017, over 15351.00 frames. ], tot_loss[loss=0.08347, simple_loss=0.1023, pruned_loss=0.0211, audio_tagging_loss=0.01124, over 2734037.86 frames. ], batch size: 58, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:26:34,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=884720.0, ans=0.125 2023-11-20 01:26:44,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.45 vs. limit=15.0 2023-11-20 01:27:18,848 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 132750 2023-11-20 01:27:24,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=884986.6666666666, ans=0.0 2023-11-20 01:27:31,593 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 500, loss[loss=0.08306, simple_loss=0.1023, pruned_loss=0.02199, audio_tagging_loss=0.009946, over 14898.00 frames. ], tot_loss[loss=0.08381, simple_loss=0.1029, pruned_loss=0.02141, audio_tagging_loss=0.01097, over 2798999.71 frames. ], batch size: 57, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:27:33,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=885053.3333333334, ans=0.0 2023-11-20 01:27:36,665 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:27:40,022 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.508e+01 8.200e+01 8.678e+01 9.366e+01 1.155e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-20 01:28:06,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=12.0 2023-11-20 01:28:22,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=885320.0, ans=0.125 2023-11-20 01:28:23,585 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 132800 2023-11-20 01:28:36,881 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 550, loss[loss=0.06606, simple_loss=0.07494, pruned_loss=0.01519, audio_tagging_loss=0.01341, over 15237.00 frames. ], tot_loss[loss=0.08341, simple_loss=0.1024, pruned_loss=0.02137, audio_tagging_loss=0.01083, over 2860476.04 frames. ], batch size: 60, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:28:40,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=885386.6666666666, ans=0.2 2023-11-20 01:28:48,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=885453.3333333334, ans=0.125 2023-11-20 01:29:01,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=885520.0, ans=0.125 2023-11-20 01:29:25,176 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.98 vs. limit=15.0 2023-11-20 01:29:28,464 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 132850 2023-11-20 01:29:39,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=885653.3333333334, ans=0.125 2023-11-20 01:29:41,406 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 600, loss[loss=0.07056, simple_loss=0.08919, pruned_loss=0.01701, audio_tagging_loss=0.008952, over 13920.00 frames. ], tot_loss[loss=0.0833, simple_loss=0.1025, pruned_loss=0.02132, audio_tagging_loss=0.01071, over 2901281.17 frames. ], batch size: 56, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:29:48,925 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:29:50,451 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.996e+01 8.266e+01 9.047e+01 9.748e+01 1.324e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-20 01:30:01,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=885786.6666666666, ans=0.125 2023-11-20 01:30:32,875 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 132900 2023-11-20 01:30:44,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=886053.3333333334, ans=0.2 2023-11-20 01:30:45,507 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 650, loss[loss=0.09469, simple_loss=0.1192, pruned_loss=0.02878, audio_tagging_loss=0.006315, over 14401.00 frames. ], tot_loss[loss=0.08284, simple_loss=0.1021, pruned_loss=0.02112, audio_tagging_loss=0.01068, over 2923362.30 frames. ], batch size: 53, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:30:47,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=886053.3333333334, ans=0.025 2023-11-20 01:31:04,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=886120.0, ans=0.125 2023-11-20 01:31:15,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=886186.6666666666, ans=0.125 2023-11-20 01:31:37,856 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.40 vs. limit=15.0 2023-11-20 01:31:38,622 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 132950 2023-11-20 01:31:42,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=886320.0, ans=0.0 2023-11-20 01:31:44,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=886320.0, ans=0.1 2023-11-20 01:31:47,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=886320.0, ans=0.125 2023-11-20 01:31:51,419 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 700, loss[loss=0.09049, simple_loss=0.1172, pruned_loss=0.02295, audio_tagging_loss=0.00895, over 15319.00 frames. ], tot_loss[loss=0.08288, simple_loss=0.1028, pruned_loss=0.02094, audio_tagging_loss=0.01054, over 2953565.14 frames. ], batch size: 57, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:32:00,267 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.620e+01 8.108e+01 8.721e+01 9.361e+01 1.160e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-20 01:32:25,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=886520.0, ans=0.2 2023-11-20 01:32:38,501 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=12.0 2023-11-20 01:32:43,718 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 133000 2023-11-20 01:32:43,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=886653.3333333334, ans=0.125 2023-11-20 01:32:47,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=886653.3333333334, ans=0.125 2023-11-20 01:32:51,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=886653.3333333334, ans=0.125 2023-11-20 01:32:56,793 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 750, loss[loss=0.08364, simple_loss=0.1011, pruned_loss=0.02239, audio_tagging_loss=0.01068, over 14388.00 frames. ], tot_loss[loss=0.08259, simple_loss=0.1024, pruned_loss=0.02087, audio_tagging_loss=0.01051, over 2978648.41 frames. ], batch size: 54, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:33:01,280 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.13 vs. limit=15.0 2023-11-20 01:33:02,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=886720.0, ans=0.1 2023-11-20 01:33:04,887 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2023-11-20 01:33:05,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=886720.0, ans=0.1 2023-11-20 01:33:08,402 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.22 vs. limit=15.0 2023-11-20 01:33:09,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=886786.6666666666, ans=0.0 2023-11-20 01:33:09,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=886786.6666666666, ans=0.0 2023-11-20 01:33:19,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=886786.6666666666, ans=0.0 2023-11-20 01:33:26,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=886853.3333333334, ans=0.125 2023-11-20 01:33:34,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=886920.0, ans=0.1 2023-11-20 01:33:48,611 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 133050 2023-11-20 01:34:00,837 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 800, loss[loss=0.06942, simple_loss=0.08433, pruned_loss=0.01536, audio_tagging_loss=0.01189, over 16412.00 frames. ], tot_loss[loss=0.0832, simple_loss=0.1033, pruned_loss=0.02097, audio_tagging_loss=0.01058, over 3001338.25 frames. ], batch size: 64, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:34:08,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=887053.3333333334, ans=0.2 2023-11-20 01:34:10,151 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.686e+01 8.461e+01 9.039e+01 1.027e+02 1.682e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-20 01:34:10,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=887053.3333333334, ans=0.125 2023-11-20 01:34:10,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=887053.3333333334, ans=0.0 2023-11-20 01:34:15,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=887120.0, ans=0.125 2023-11-20 01:34:28,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=887186.6666666666, ans=0.07 2023-11-20 01:34:52,120 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 133100 2023-11-20 01:35:05,336 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 850, loss[loss=0.09927, simple_loss=0.1229, pruned_loss=0.02736, audio_tagging_loss=0.01045, over 15188.00 frames. ], tot_loss[loss=0.08332, simple_loss=0.1031, pruned_loss=0.02117, audio_tagging_loss=0.01061, over 3017228.31 frames. ], batch size: 55, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:35:08,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=887386.6666666666, ans=0.1 2023-11-20 01:35:13,903 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2023-11-20 01:35:15,177 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.74 vs. limit=15.0 2023-11-20 01:35:35,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=887520.0, ans=0.0 2023-11-20 01:35:38,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2023-11-20 01:35:41,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=887520.0, ans=0.1 2023-11-20 01:35:45,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=887586.6666666666, ans=0.125 2023-11-20 01:35:57,667 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 133150 2023-11-20 01:35:57,906 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:36:01,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=887653.3333333334, ans=0.125 2023-11-20 01:36:10,437 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 900, loss[loss=0.09999, simple_loss=0.1168, pruned_loss=0.02812, audio_tagging_loss=0.01348, over 14700.00 frames. ], tot_loss[loss=0.08369, simple_loss=0.1035, pruned_loss=0.02127, audio_tagging_loss=0.01068, over 3017338.44 frames. ], batch size: 56, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:36:11,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=887720.0, ans=10.0 2023-11-20 01:36:19,051 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.555e+01 8.224e+01 8.986e+01 9.963e+01 2.180e+02, threshold=1.797e+02, percent-clipped=1.0 2023-11-20 01:36:39,818 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:36:43,922 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2023-11-20 01:36:45,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=887853.3333333334, ans=0.0 2023-11-20 01:37:01,709 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 133200 2023-11-20 01:37:14,461 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 950, loss[loss=0.08017, simple_loss=0.09876, pruned_loss=0.02018, audio_tagging_loss=0.01061, over 14649.00 frames. ], tot_loss[loss=0.08349, simple_loss=0.1033, pruned_loss=0.02132, audio_tagging_loss=0.0105, over 3031093.67 frames. ], batch size: 57, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:37:14,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=888053.3333333334, ans=0.125 2023-11-20 01:37:21,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=888053.3333333334, ans=0.125 2023-11-20 01:37:25,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=888053.3333333334, ans=0.1 2023-11-20 01:37:26,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=888120.0, ans=0.125 2023-11-20 01:37:46,778 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.22 vs. limit=12.0 2023-11-20 01:37:50,459 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2023-11-20 01:37:53,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=888253.3333333334, ans=0.125 2023-11-20 01:38:05,972 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 133250 2023-11-20 01:38:11,556 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:38:16,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=888320.0, ans=0.0 2023-11-20 01:38:19,465 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 1000, loss[loss=0.08216, simple_loss=0.1044, pruned_loss=0.01755, audio_tagging_loss=0.0124, over 16146.00 frames. ], tot_loss[loss=0.08308, simple_loss=0.1029, pruned_loss=0.02131, audio_tagging_loss=0.01034, over 3038608.09 frames. ], batch size: 59, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:38:28,772 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.591e+01 8.070e+01 8.953e+01 9.480e+01 1.441e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-20 01:38:34,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=888453.3333333334, ans=0.1 2023-11-20 01:38:34,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=888453.3333333334, ans=0.2 2023-11-20 01:38:37,432 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.89 vs. limit=6.0 2023-11-20 01:38:45,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=888520.0, ans=0.125 2023-11-20 01:38:45,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=888520.0, ans=0.0 2023-11-20 01:38:46,575 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 01:38:46,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=888520.0, ans=0.125 2023-11-20 01:38:58,556 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2023-11-20 01:39:11,501 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 133300 2023-11-20 01:39:13,876 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2023-11-20 01:39:24,791 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 1050, loss[loss=0.1061, simple_loss=0.1405, pruned_loss=0.0271, audio_tagging_loss=0.008678, over 16096.00 frames. ], tot_loss[loss=0.08314, simple_loss=0.1031, pruned_loss=0.02134, audio_tagging_loss=0.01024, over 3040332.96 frames. ], batch size: 59, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:39:28,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=888720.0, ans=0.125 2023-11-20 01:39:51,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=888853.3333333334, ans=0.125 2023-11-20 01:39:53,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=888853.3333333334, ans=0.0 2023-11-20 01:40:01,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=888853.3333333334, ans=0.0 2023-11-20 01:40:02,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=888920.0, ans=0.125 2023-11-20 01:40:17,234 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 133350 2023-11-20 01:40:17,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=888986.6666666666, ans=0.125 2023-11-20 01:40:23,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=888986.6666666666, ans=0.2 2023-11-20 01:40:29,531 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 1100, loss[loss=0.1046, simple_loss=0.1252, pruned_loss=0.03071, audio_tagging_loss=0.01133, over 16153.00 frames. ], tot_loss[loss=0.08259, simple_loss=0.1025, pruned_loss=0.02118, audio_tagging_loss=0.01016, over 3042640.13 frames. ], batch size: 60, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:40:29,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=889053.3333333334, ans=0.125 2023-11-20 01:40:32,072 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 01:40:33,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=889053.3333333334, ans=0.1 2023-11-20 01:40:34,083 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.27 vs. limit=12.0 2023-11-20 01:40:38,591 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 8.052e+01 8.709e+01 9.479e+01 1.259e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 01:40:41,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=889120.0, ans=0.125 2023-11-20 01:41:07,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=889253.3333333334, ans=0.0 2023-11-20 01:41:14,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=889253.3333333334, ans=0.125 2023-11-20 01:41:17,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=889253.3333333334, ans=0.1 2023-11-20 01:41:21,042 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 133400 2023-11-20 01:41:28,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=889320.0, ans=0.125 2023-11-20 01:41:29,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=889320.0, ans=0.125 2023-11-20 01:41:31,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=889320.0, ans=0.125 2023-11-20 01:41:34,529 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 1150, loss[loss=0.06102, simple_loss=0.06813, pruned_loss=0.01408, audio_tagging_loss=0.01288, over 16675.00 frames. ], tot_loss[loss=0.08291, simple_loss=0.1031, pruned_loss=0.02128, audio_tagging_loss=0.01009, over 3045038.25 frames. ], batch size: 66, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:41:38,358 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:41:39,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=889386.6666666666, ans=0.125 2023-11-20 01:41:50,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=889453.3333333334, ans=0.125 2023-11-20 01:41:55,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=889453.3333333334, ans=0.0 2023-11-20 01:41:59,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.63 vs. limit=22.5 2023-11-20 01:41:59,626 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.88 vs. limit=15.0 2023-11-20 01:42:03,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=889520.0, ans=0.1 2023-11-20 01:42:04,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=889520.0, ans=0.95 2023-11-20 01:42:06,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=889520.0, ans=0.0 2023-11-20 01:42:08,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=889520.0, ans=0.2 2023-11-20 01:42:09,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=889520.0, ans=0.0 2023-11-20 01:42:25,915 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 133450 2023-11-20 01:42:27,677 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.51 vs. limit=22.5 2023-11-20 01:42:39,275 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 1200, loss[loss=0.06052, simple_loss=0.06962, pruned_loss=0.01389, audio_tagging_loss=0.01182, over 13621.00 frames. ], tot_loss[loss=0.08211, simple_loss=0.1018, pruned_loss=0.02101, audio_tagging_loss=0.01019, over 3042482.23 frames. ], batch size: 54, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:42:41,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=889720.0, ans=0.125 2023-11-20 01:42:48,417 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.434e+01 8.251e+01 9.001e+01 9.736e+01 1.493e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-20 01:42:53,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=889786.6666666666, ans=0.125 2023-11-20 01:42:56,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=889786.6666666666, ans=0.0 2023-11-20 01:43:31,718 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 133500 2023-11-20 01:43:31,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=889986.6666666666, ans=0.1 2023-11-20 01:43:43,774 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 1250, loss[loss=0.05927, simple_loss=0.07329, pruned_loss=0.01285, audio_tagging_loss=0.009775, over 14928.00 frames. ], tot_loss[loss=0.08179, simple_loss=0.1013, pruned_loss=0.021, audio_tagging_loss=0.01015, over 3039316.60 frames. ], batch size: 57, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:43:44,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=890053.3333333334, ans=0.125 2023-11-20 01:44:35,362 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 133550 2023-11-20 01:44:39,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=890320.0, ans=0.1 2023-11-20 01:44:48,075 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 1300, loss[loss=0.1121, simple_loss=0.1409, pruned_loss=0.03245, audio_tagging_loss=0.009152, over 15033.00 frames. ], tot_loss[loss=0.08213, simple_loss=0.1019, pruned_loss=0.02112, audio_tagging_loss=0.01008, over 3038971.21 frames. ], batch size: 52, lr: 5.87e-03, grad_scale: 64.0 2023-11-20 01:44:57,234 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.375e+01 9.143e+01 9.896e+01 1.258e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-20 01:44:59,153 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.93 vs. limit=15.0 2023-11-20 01:45:00,457 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.63 vs. limit=12.0 2023-11-20 01:45:07,674 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.38 vs. limit=22.5 2023-11-20 01:45:10,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=890453.3333333334, ans=0.2 2023-11-20 01:45:21,023 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=12.0 2023-11-20 01:45:39,243 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 133600 2023-11-20 01:45:46,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=890653.3333333334, ans=0.125 2023-11-20 01:45:52,908 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 1350, loss[loss=0.06456, simple_loss=0.08173, pruned_loss=0.01407, audio_tagging_loss=0.009625, over 16100.00 frames. ], tot_loss[loss=0.08258, simple_loss=0.1026, pruned_loss=0.02125, audio_tagging_loss=0.01005, over 3043953.71 frames. ], batch size: 61, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:45:55,106 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.83 vs. limit=22.5 2023-11-20 01:45:55,996 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.57 vs. limit=22.5 2023-11-20 01:46:09,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=890786.6666666666, ans=12.0 2023-11-20 01:46:24,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=890853.3333333334, ans=0.05 2023-11-20 01:46:32,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=890920.0, ans=0.125 2023-11-20 01:46:34,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=890920.0, ans=0.025 2023-11-20 01:46:39,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=890920.0, ans=0.125 2023-11-20 01:46:41,259 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 01:46:45,176 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 133650 2023-11-20 01:46:45,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=890986.6666666666, ans=0.125 2023-11-20 01:46:58,773 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 1400, loss[loss=0.1019, simple_loss=0.1203, pruned_loss=0.03323, audio_tagging_loss=0.008511, over 15039.00 frames. ], tot_loss[loss=0.08243, simple_loss=0.1024, pruned_loss=0.02117, audio_tagging_loss=0.01004, over 3046154.45 frames. ], batch size: 56, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:47:03,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=891053.3333333334, ans=0.125 2023-11-20 01:47:08,567 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.444e+01 7.927e+01 8.547e+01 9.494e+01 1.207e+02, threshold=1.709e+02, percent-clipped=0.0 2023-11-20 01:47:26,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=891186.6666666666, ans=0.125 2023-11-20 01:47:34,002 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=15.0 2023-11-20 01:47:44,326 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.45 vs. limit=15.0 2023-11-20 01:47:49,844 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 133700 2023-11-20 01:48:02,916 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 1450, loss[loss=0.1121, simple_loss=0.1419, pruned_loss=0.03293, audio_tagging_loss=0.008262, over 14527.00 frames. ], tot_loss[loss=0.08283, simple_loss=0.1029, pruned_loss=0.02132, audio_tagging_loss=0.01007, over 3042081.63 frames. ], batch size: 52, lr: 5.86e-03, grad_scale: 16.0 2023-11-20 01:48:16,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.55 vs. limit=15.0 2023-11-20 01:48:45,568 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.16 vs. limit=10.0 2023-11-20 01:48:54,335 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 133750 2023-11-20 01:48:57,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=891653.3333333334, ans=0.0 2023-11-20 01:49:02,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=891653.3333333334, ans=0.125 2023-11-20 01:49:05,351 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.15 vs. limit=15.0 2023-11-20 01:49:07,072 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 1500, loss[loss=0.04163, simple_loss=0.04199, pruned_loss=0.00741, audio_tagging_loss=0.01322, over 14696.00 frames. ], tot_loss[loss=0.08245, simple_loss=0.102, pruned_loss=0.02117, audio_tagging_loss=0.01029, over 3038325.05 frames. ], batch size: 58, lr: 5.86e-03, grad_scale: 16.0 2023-11-20 01:49:11,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=891720.0, ans=0.0 2023-11-20 01:49:18,847 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.912e+01 7.823e+01 8.560e+01 9.381e+01 1.533e+02, threshold=1.712e+02, percent-clipped=0.0 2023-11-20 01:49:32,223 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.77 vs. limit=15.0 2023-11-20 01:49:42,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=891853.3333333334, ans=0.0 2023-11-20 01:49:43,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=891853.3333333334, ans=0.1 2023-11-20 01:49:59,152 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 133800 2023-11-20 01:49:59,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=891986.6666666666, ans=0.0 2023-11-20 01:50:12,765 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 1550, loss[loss=0.1084, simple_loss=0.1377, pruned_loss=0.03204, audio_tagging_loss=0.007541, over 15613.00 frames. ], tot_loss[loss=0.08333, simple_loss=0.1032, pruned_loss=0.02141, audio_tagging_loss=0.01031, over 3043442.94 frames. ], batch size: 58, lr: 5.86e-03, grad_scale: 16.0 2023-11-20 01:50:32,590 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2023-11-20 01:50:47,803 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2023-11-20 01:50:50,162 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.15 vs. limit=10.0 2023-11-20 01:50:52,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=892253.3333333334, ans=0.125 2023-11-20 01:50:57,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=892253.3333333334, ans=0.0 2023-11-20 01:50:59,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=892253.3333333334, ans=0.125 2023-11-20 01:51:00,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=892253.3333333334, ans=0.0 2023-11-20 01:51:04,171 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 133850 2023-11-20 01:51:16,419 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 1600, loss[loss=0.068, simple_loss=0.08632, pruned_loss=0.0147, audio_tagging_loss=0.01013, over 14687.00 frames. ], tot_loss[loss=0.08358, simple_loss=0.1037, pruned_loss=0.02143, audio_tagging_loss=0.01029, over 3047826.04 frames. ], batch size: 56, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:51:19,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=892386.6666666666, ans=0.1 2023-11-20 01:51:28,058 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.791e+01 8.075e+01 8.775e+01 9.622e+01 1.213e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-20 01:51:35,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=892453.3333333334, ans=0.125 2023-11-20 01:51:40,642 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.86 vs. limit=22.5 2023-11-20 01:52:09,104 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 133900 2023-11-20 01:52:13,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=892653.3333333334, ans=0.125 2023-11-20 01:52:21,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=892720.0, ans=0.0 2023-11-20 01:52:22,012 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 1650, loss[loss=0.09368, simple_loss=0.1175, pruned_loss=0.02538, audio_tagging_loss=0.009553, over 15666.00 frames. ], tot_loss[loss=0.08346, simple_loss=0.1034, pruned_loss=0.02138, audio_tagging_loss=0.01039, over 3044416.61 frames. ], batch size: 58, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:52:24,031 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.35 vs. limit=22.5 2023-11-20 01:52:37,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=892786.6666666666, ans=0.0 2023-11-20 01:52:58,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=892853.3333333334, ans=0.025 2023-11-20 01:52:58,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=892853.3333333334, ans=0.2 2023-11-20 01:53:10,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=892920.0, ans=0.125 2023-11-20 01:53:13,624 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 133950 2023-11-20 01:53:26,561 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 1700, loss[loss=0.09803, simple_loss=0.1213, pruned_loss=0.02895, audio_tagging_loss=0.008448, over 15482.00 frames. ], tot_loss[loss=0.08343, simple_loss=0.1032, pruned_loss=0.02142, audio_tagging_loss=0.0104, over 3044345.42 frames. ], batch size: 55, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:53:29,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=893053.3333333334, ans=0.2 2023-11-20 01:53:38,259 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.118e+01 8.650e+01 9.250e+01 1.178e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-20 01:53:42,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=893120.0, ans=0.125 2023-11-20 01:53:43,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=893120.0, ans=0.125 2023-11-20 01:53:43,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=893120.0, ans=0.125 2023-11-20 01:54:18,417 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 134000 2023-11-20 01:54:27,266 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.94 vs. limit=15.0 2023-11-20 01:54:31,563 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 1750, loss[loss=0.06059, simple_loss=0.06949, pruned_loss=0.01492, audio_tagging_loss=0.01093, over 15253.00 frames. ], tot_loss[loss=0.08289, simple_loss=0.1026, pruned_loss=0.02116, audio_tagging_loss=0.01043, over 3039092.93 frames. ], batch size: 59, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:54:41,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=893386.6666666666, ans=0.125 2023-11-20 01:54:52,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=893453.3333333334, ans=0.2 2023-11-20 01:54:53,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=893453.3333333334, ans=0.0 2023-11-20 01:55:00,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=893520.0, ans=0.5 2023-11-20 01:55:11,052 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=12.0 2023-11-20 01:55:13,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.43 vs. limit=15.0 2023-11-20 01:55:14,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=893586.6666666666, ans=0.125 2023-11-20 01:55:18,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2023-11-20 01:55:23,913 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 134050 2023-11-20 01:55:36,243 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 1800, loss[loss=0.08921, simple_loss=0.1029, pruned_loss=0.02692, audio_tagging_loss=0.01086, over 15066.00 frames. ], tot_loss[loss=0.08311, simple_loss=0.1029, pruned_loss=0.02133, audio_tagging_loss=0.01035, over 3034538.73 frames. ], batch size: 55, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:55:36,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=893720.0, ans=0.0 2023-11-20 01:55:43,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=893720.0, ans=0.0 2023-11-20 01:55:47,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=893720.0, ans=0.125 2023-11-20 01:55:47,809 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.798e+01 8.080e+01 8.622e+01 9.512e+01 1.674e+02, threshold=1.724e+02, percent-clipped=0.0 2023-11-20 01:55:53,936 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.65 vs. limit=22.5 2023-11-20 01:55:54,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=893786.6666666666, ans=0.0 2023-11-20 01:55:56,364 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2023-11-20 01:55:58,548 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:56:04,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=893853.3333333334, ans=0.2 2023-11-20 01:56:28,512 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 134100 2023-11-20 01:56:36,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=893986.6666666666, ans=0.0 2023-11-20 01:56:41,505 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 1850, loss[loss=0.09098, simple_loss=0.1111, pruned_loss=0.02468, audio_tagging_loss=0.01075, over 15500.00 frames. ], tot_loss[loss=0.08311, simple_loss=0.103, pruned_loss=0.02127, audio_tagging_loss=0.01036, over 3034805.91 frames. ], batch size: 59, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:56:54,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=894120.0, ans=0.125 2023-11-20 01:57:09,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=894186.6666666666, ans=0.0 2023-11-20 01:57:11,624 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:57:24,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=894253.3333333334, ans=0.09899494936611666 2023-11-20 01:57:33,247 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 134150 2023-11-20 01:57:42,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=894320.0, ans=0.125 2023-11-20 01:57:45,560 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 1900, loss[loss=0.08734, simple_loss=0.1155, pruned_loss=0.02109, audio_tagging_loss=0.008486, over 15074.00 frames. ], tot_loss[loss=0.08252, simple_loss=0.1023, pruned_loss=0.02113, audio_tagging_loss=0.01021, over 3038841.47 frames. ], batch size: 55, lr: 5.85e-03, grad_scale: 32.0 2023-11-20 01:57:57,181 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.68 vs. limit=15.0 2023-11-20 01:57:59,157 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.930e+01 8.176e+01 8.935e+01 9.422e+01 1.185e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 01:58:01,027 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2023-11-20 01:58:02,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=894453.3333333334, ans=0.0 2023-11-20 01:58:19,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=894520.0, ans=0.125 2023-11-20 01:58:34,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=894586.6666666666, ans=0.2 2023-11-20 01:58:38,345 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 134200 2023-11-20 01:58:41,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=894653.3333333334, ans=0.0 2023-11-20 01:58:51,033 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 1950, loss[loss=0.08175, simple_loss=0.1065, pruned_loss=0.01881, audio_tagging_loss=0.009699, over 14987.00 frames. ], tot_loss[loss=0.08192, simple_loss=0.1015, pruned_loss=0.02087, audio_tagging_loss=0.01029, over 3042369.83 frames. ], batch size: 56, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 01:59:16,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=894853.3333333334, ans=0.125 2023-11-20 01:59:20,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=894853.3333333334, ans=0.125 2023-11-20 01:59:22,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=894853.3333333334, ans=0.0 2023-11-20 01:59:24,790 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2023-11-20 01:59:25,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=894853.3333333334, ans=0.1 2023-11-20 01:59:25,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=894853.3333333334, ans=0.0 2023-11-20 01:59:37,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=894920.0, ans=0.2 2023-11-20 01:59:42,060 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 134250 2023-11-20 01:59:54,836 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 2000, loss[loss=0.1046, simple_loss=0.1234, pruned_loss=0.02972, audio_tagging_loss=0.01317, over 15299.00 frames. ], tot_loss[loss=0.0819, simple_loss=0.1013, pruned_loss=0.02089, audio_tagging_loss=0.01036, over 3044092.82 frames. ], batch size: 56, lr: 5.85e-03, grad_scale: 32.0 2023-11-20 02:00:08,115 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.403e+01 7.896e+01 8.593e+01 9.449e+01 1.399e+02, threshold=1.719e+02, percent-clipped=0.0 2023-11-20 02:00:23,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=895186.6666666666, ans=0.125 2023-11-20 02:00:25,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=895186.6666666666, ans=0.125 2023-11-20 02:00:47,362 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 134300 2023-11-20 02:00:59,969 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 2050, loss[loss=0.07142, simple_loss=0.08649, pruned_loss=0.01894, audio_tagging_loss=0.009232, over 14559.00 frames. ], tot_loss[loss=0.0815, simple_loss=0.101, pruned_loss=0.02075, audio_tagging_loss=0.01023, over 3034123.94 frames. ], batch size: 56, lr: 5.85e-03, grad_scale: 32.0 2023-11-20 02:01:18,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=895453.3333333334, ans=0.1 2023-11-20 02:01:21,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=895453.3333333334, ans=0.1 2023-11-20 02:01:37,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=895520.0, ans=0.07 2023-11-20 02:01:38,784 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.03 vs. limit=15.0 2023-11-20 02:01:51,623 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 134350 2023-11-20 02:02:04,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=895720.0, ans=0.0 2023-11-20 02:02:05,147 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 2100, loss[loss=0.08524, simple_loss=0.1028, pruned_loss=0.0211, audio_tagging_loss=0.01272, over 15127.00 frames. ], tot_loss[loss=0.08208, simple_loss=0.1018, pruned_loss=0.02099, audio_tagging_loss=0.0102, over 3038601.20 frames. ], batch size: 56, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 02:02:10,797 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.65 vs. limit=15.0 2023-11-20 02:02:13,823 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=22.5 2023-11-20 02:02:15,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=895720.0, ans=0.04949747468305833 2023-11-20 02:02:15,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=895720.0, ans=0.125 2023-11-20 02:02:19,150 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.798e+01 8.312e+01 8.947e+01 9.682e+01 1.152e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-20 02:02:23,660 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.97 vs. limit=15.0 2023-11-20 02:02:30,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=895853.3333333334, ans=0.125 2023-11-20 02:02:41,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=895920.0, ans=0.0 2023-11-20 02:02:43,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=895920.0, ans=0.1 2023-11-20 02:02:56,485 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 134400 2023-11-20 02:03:09,592 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 2150, loss[loss=0.08306, simple_loss=0.1036, pruned_loss=0.02136, audio_tagging_loss=0.009877, over 14619.00 frames. ], tot_loss[loss=0.08101, simple_loss=0.1004, pruned_loss=0.02059, audio_tagging_loss=0.0102, over 3036561.90 frames. ], batch size: 57, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 02:03:20,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=896053.3333333334, ans=0.07 2023-11-20 02:03:34,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=896186.6666666666, ans=0.0 2023-11-20 02:03:36,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=896186.6666666666, ans=0.125 2023-11-20 02:03:42,943 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.45 vs. limit=10.0 2023-11-20 02:03:48,924 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:03:50,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=896253.3333333334, ans=0.125 2023-11-20 02:03:52,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=896253.3333333334, ans=0.125 2023-11-20 02:04:01,686 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 134450 2023-11-20 02:04:01,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=896320.0, ans=0.125 2023-11-20 02:04:02,140 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.12 vs. limit=15.0 2023-11-20 02:04:14,510 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 2200, loss[loss=0.09516, simple_loss=0.1131, pruned_loss=0.02974, audio_tagging_loss=0.008884, over 16212.00 frames. ], tot_loss[loss=0.08163, simple_loss=0.1013, pruned_loss=0.02088, audio_tagging_loss=0.0101, over 3033137.05 frames. ], batch size: 62, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 02:04:28,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=896453.3333333334, ans=10.0 2023-11-20 02:04:28,768 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.245e+01 8.199e+01 8.937e+01 9.521e+01 1.153e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 02:04:42,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=896520.0, ans=0.125 2023-11-20 02:04:54,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=896586.6666666666, ans=0.035 2023-11-20 02:04:55,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=896586.6666666666, ans=0.125 2023-11-20 02:05:06,378 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 134500 2023-11-20 02:05:19,211 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 2250, loss[loss=0.08371, simple_loss=0.1033, pruned_loss=0.02187, audio_tagging_loss=0.01021, over 16356.00 frames. ], tot_loss[loss=0.08236, simple_loss=0.1022, pruned_loss=0.02107, audio_tagging_loss=0.01017, over 3029072.40 frames. ], batch size: 60, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 02:05:20,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=896720.0, ans=0.1 2023-11-20 02:05:24,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=896720.0, ans=0.2 2023-11-20 02:05:31,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=896786.6666666666, ans=0.125 2023-11-20 02:05:35,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.01 vs. limit=10.0 2023-11-20 02:05:44,770 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2023-11-20 02:06:08,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=896920.0, ans=0.1 2023-11-20 02:06:10,887 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 134550 2023-11-20 02:06:22,607 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.53 vs. limit=22.5 2023-11-20 02:06:24,376 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 2300, loss[loss=0.08379, simple_loss=0.1065, pruned_loss=0.01918, audio_tagging_loss=0.01138, over 15006.00 frames. ], tot_loss[loss=0.08329, simple_loss=0.1031, pruned_loss=0.02148, audio_tagging_loss=0.01024, over 3033155.34 frames. ], batch size: 57, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 02:06:24,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=897053.3333333334, ans=0.0 2023-11-20 02:06:38,531 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.171e+01 8.997e+01 9.810e+01 1.855e+02, threshold=1.799e+02, percent-clipped=1.0 2023-11-20 02:06:46,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=897120.0, ans=0.025 2023-11-20 02:07:02,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=897253.3333333334, ans=0.2 2023-11-20 02:07:15,452 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 134600 2023-11-20 02:07:20,749 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:07:25,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=897320.0, ans=0.125 2023-11-20 02:07:28,690 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 2350, loss[loss=0.07705, simple_loss=0.09619, pruned_loss=0.02056, audio_tagging_loss=0.0084, over 14147.00 frames. ], tot_loss[loss=0.08289, simple_loss=0.1027, pruned_loss=0.0212, audio_tagging_loss=0.01034, over 3034789.66 frames. ], batch size: 54, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:08:06,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=897586.6666666666, ans=0.2 2023-11-20 02:08:20,754 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 134650 2023-11-20 02:08:33,532 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 2400, loss[loss=0.08934, simple_loss=0.1069, pruned_loss=0.02545, audio_tagging_loss=0.01046, over 14482.00 frames. ], tot_loss[loss=0.08336, simple_loss=0.1032, pruned_loss=0.02138, audio_tagging_loss=0.01038, over 3038999.88 frames. ], batch size: 56, lr: 5.84e-03, grad_scale: 32.0 2023-11-20 02:08:43,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=897720.0, ans=0.1 2023-11-20 02:08:47,717 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.416e+01 8.072e+01 8.719e+01 9.774e+01 1.313e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-20 02:08:57,664 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.53 vs. limit=22.5 2023-11-20 02:09:01,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=897853.3333333334, ans=0.0 2023-11-20 02:09:22,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2023-11-20 02:09:24,703 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 134700 2023-11-20 02:09:30,379 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.41 vs. limit=15.0 2023-11-20 02:09:35,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=897986.6666666666, ans=0.1 2023-11-20 02:09:36,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=898053.3333333334, ans=0.2 2023-11-20 02:09:37,370 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 2450, loss[loss=0.1087, simple_loss=0.1436, pruned_loss=0.02898, audio_tagging_loss=0.007951, over 15943.00 frames. ], tot_loss[loss=0.08356, simple_loss=0.1038, pruned_loss=0.02132, audio_tagging_loss=0.01031, over 3047539.50 frames. ], batch size: 57, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:09:43,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=898053.3333333334, ans=0.125 2023-11-20 02:09:46,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=898053.3333333334, ans=0.0 2023-11-20 02:10:03,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=898186.6666666666, ans=0.0 2023-11-20 02:10:10,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=898186.6666666666, ans=0.125 2023-11-20 02:10:10,842 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2023-11-20 02:10:15,804 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.28 vs. limit=10.0 2023-11-20 02:10:25,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=898253.3333333334, ans=0.125 2023-11-20 02:10:29,981 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 134750 2023-11-20 02:10:42,743 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 2500, loss[loss=0.08001, simple_loss=0.09357, pruned_loss=0.02151, audio_tagging_loss=0.01172, over 15925.00 frames. ], tot_loss[loss=0.08307, simple_loss=0.1033, pruned_loss=0.02116, audio_tagging_loss=0.01024, over 3047094.21 frames. ], batch size: 61, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:10:57,315 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.817e+01 7.993e+01 8.796e+01 9.570e+01 1.207e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 02:11:30,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=898586.6666666666, ans=0.125 2023-11-20 02:11:34,144 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 134800 2023-11-20 02:11:38,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=898653.3333333334, ans=0.1 2023-11-20 02:11:46,182 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=15.0 2023-11-20 02:11:46,889 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 2550, loss[loss=0.1055, simple_loss=0.1321, pruned_loss=0.03131, audio_tagging_loss=0.008119, over 15480.00 frames. ], tot_loss[loss=0.08284, simple_loss=0.1029, pruned_loss=0.02122, audio_tagging_loss=0.01019, over 3039038.57 frames. ], batch size: 54, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:12:03,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=898786.6666666666, ans=0.125 2023-11-20 02:12:21,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.76 vs. limit=15.0 2023-11-20 02:12:23,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=898853.3333333334, ans=0.0 2023-11-20 02:12:39,669 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 134850 2023-11-20 02:12:52,581 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 2600, loss[loss=0.1046, simple_loss=0.1387, pruned_loss=0.02583, audio_tagging_loss=0.009394, over 14873.00 frames. ], tot_loss[loss=0.08238, simple_loss=0.1025, pruned_loss=0.02108, audio_tagging_loss=0.01005, over 3041670.13 frames. ], batch size: 53, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:13:08,524 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.034e+01 8.150e+01 8.781e+01 9.502e+01 1.826e+02, threshold=1.756e+02, percent-clipped=1.0 2023-11-20 02:13:16,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=899120.0, ans=0.125 2023-11-20 02:13:16,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=899120.0, ans=0.125 2023-11-20 02:13:22,485 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:13:28,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=899186.6666666666, ans=0.125 2023-11-20 02:13:28,668 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.82 vs. limit=15.0 2023-11-20 02:13:29,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=899186.6666666666, ans=0.2 2023-11-20 02:13:41,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=899253.3333333334, ans=0.0 2023-11-20 02:13:44,908 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 134900 2023-11-20 02:13:58,371 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 2650, loss[loss=0.0904, simple_loss=0.1115, pruned_loss=0.02455, audio_tagging_loss=0.01011, over 15727.00 frames. ], tot_loss[loss=0.08227, simple_loss=0.1025, pruned_loss=0.02101, audio_tagging_loss=0.009992, over 3045295.22 frames. ], batch size: 56, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:13:58,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=899386.6666666666, ans=0.1 2023-11-20 02:14:05,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=899386.6666666666, ans=0.0 2023-11-20 02:14:39,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=899586.6666666666, ans=0.125 2023-11-20 02:14:49,856 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 134950 2023-11-20 02:14:58,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=899653.3333333334, ans=0.0 2023-11-20 02:15:02,147 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 2700, loss[loss=0.1033, simple_loss=0.1295, pruned_loss=0.02947, audio_tagging_loss=0.009116, over 15921.00 frames. ], tot_loss[loss=0.08248, simple_loss=0.1029, pruned_loss=0.02108, audio_tagging_loss=0.009962, over 3050841.93 frames. ], batch size: 57, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:15:03,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=899720.0, ans=0.125 2023-11-20 02:15:08,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=899720.0, ans=0.1 2023-11-20 02:15:18,166 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.493e+01 8.480e+01 9.136e+01 9.839e+01 1.399e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-20 02:15:20,230 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.71 vs. limit=15.0 2023-11-20 02:15:28,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=899853.3333333334, ans=0.125 2023-11-20 02:15:37,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=899853.3333333334, ans=0.2 2023-11-20 02:15:37,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=899853.3333333334, ans=0.1 2023-11-20 02:15:42,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=899920.0, ans=0.0 2023-11-20 02:15:54,280 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 135000 2023-11-20 02:15:54,965 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.38 vs. limit=15.0 2023-11-20 02:16:07,440 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 2750, loss[loss=0.08289, simple_loss=0.1085, pruned_loss=0.02097, audio_tagging_loss=0.007651, over 15471.00 frames. ], tot_loss[loss=0.08195, simple_loss=0.1021, pruned_loss=0.02095, audio_tagging_loss=0.009931, over 3050367.07 frames. ], batch size: 58, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:16:11,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=900053.3333333334, ans=0.125 2023-11-20 02:16:11,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=900053.3333333334, ans=0.0 2023-11-20 02:16:12,943 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2023-11-20 02:16:21,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=900120.0, ans=0.07 2023-11-20 02:16:34,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=900186.6666666666, ans=0.125 2023-11-20 02:16:42,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=900186.6666666666, ans=0.125 2023-11-20 02:16:47,132 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2023-11-20 02:16:59,876 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 135050 2023-11-20 02:17:03,545 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:17:12,203 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 2800, loss[loss=0.08761, simple_loss=0.1194, pruned_loss=0.02228, audio_tagging_loss=0.005615, over 14944.00 frames. ], tot_loss[loss=0.08197, simple_loss=0.1023, pruned_loss=0.02093, audio_tagging_loss=0.009913, over 3040498.97 frames. ], batch size: 56, lr: 5.83e-03, grad_scale: 32.0 2023-11-20 02:17:20,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=900386.6666666666, ans=0.125 2023-11-20 02:17:28,023 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 8.003e+01 8.693e+01 9.427e+01 1.214e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 02:17:28,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=900453.3333333334, ans=0.125 2023-11-20 02:17:51,662 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:18:04,359 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 135100 2023-11-20 02:18:10,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=900653.3333333334, ans=0.125 2023-11-20 02:18:17,535 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 2850, loss[loss=0.1002, simple_loss=0.1193, pruned_loss=0.0303, audio_tagging_loss=0.01023, over 15006.00 frames. ], tot_loss[loss=0.08135, simple_loss=0.1014, pruned_loss=0.02073, audio_tagging_loss=0.009936, over 3038881.70 frames. ], batch size: 56, lr: 5.83e-03, grad_scale: 32.0 2023-11-20 02:18:56,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2023-11-20 02:18:57,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=900920.0, ans=0.125 2023-11-20 02:19:00,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=900920.0, ans=0.0 2023-11-20 02:19:07,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=900920.0, ans=0.125 2023-11-20 02:19:09,307 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 135150 2023-11-20 02:19:13,213 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.31 vs. limit=15.0 2023-11-20 02:19:21,421 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2023-11-20 02:19:22,089 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 2900, loss[loss=0.06927, simple_loss=0.08726, pruned_loss=0.01621, audio_tagging_loss=0.009431, over 15212.00 frames. ], tot_loss[loss=0.08172, simple_loss=0.1018, pruned_loss=0.02082, audio_tagging_loss=0.009981, over 3038082.71 frames. ], batch size: 57, lr: 5.83e-03, grad_scale: 32.0 2023-11-20 02:19:37,595 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.047e+01 8.169e+01 9.046e+01 9.875e+01 2.052e+02, threshold=1.809e+02, percent-clipped=1.0 2023-11-20 02:19:56,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=901186.6666666666, ans=0.125 2023-11-20 02:20:13,544 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 135200 2023-11-20 02:20:18,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=901320.0, ans=0.125 2023-11-20 02:20:26,786 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 2950, loss[loss=0.09643, simple_loss=0.1229, pruned_loss=0.02449, audio_tagging_loss=0.01051, over 16140.00 frames. ], tot_loss[loss=0.08224, simple_loss=0.1024, pruned_loss=0.021, audio_tagging_loss=0.01002, over 3043814.71 frames. ], batch size: 58, lr: 5.83e-03, grad_scale: 32.0 2023-11-20 02:20:41,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=901453.3333333334, ans=0.0 2023-11-20 02:21:00,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=901520.0, ans=0.1 2023-11-20 02:21:05,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=901586.6666666666, ans=0.0 2023-11-20 02:21:14,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=901586.6666666666, ans=0.1 2023-11-20 02:21:18,823 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 135250 2023-11-20 02:21:31,825 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 3000, loss[loss=0.08619, simple_loss=0.1083, pruned_loss=0.01922, audio_tagging_loss=0.01284, over 15318.00 frames. ], tot_loss[loss=0.08217, simple_loss=0.1022, pruned_loss=0.021, audio_tagging_loss=0.01006, over 3040495.70 frames. ], batch size: 56, lr: 5.83e-03, grad_scale: 16.0 2023-11-20 02:21:31,827 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-20 02:22:13,519 INFO [train_asr.py:1294] (3/4) Epoch 12, validation: loss=0.0631, simple_loss=0.05442, pruned_loss=0.006068, audio_tagging_loss=0.02982, over 4681554.00 frames. 2023-11-20 02:22:13,520 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-20 02:22:29,866 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.617e+01 8.337e+01 8.935e+01 9.823e+01 2.024e+02, threshold=1.787e+02, percent-clipped=1.0 2023-11-20 02:22:42,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=901853.3333333334, ans=0.125 2023-11-20 02:22:49,595 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.40 vs. limit=15.0 2023-11-20 02:23:05,039 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 135300 2023-11-20 02:23:10,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=901986.6666666666, ans=0.0 2023-11-20 02:23:17,200 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 3050, loss[loss=0.07267, simple_loss=0.09425, pruned_loss=0.01544, audio_tagging_loss=0.0101, over 14847.00 frames. ], tot_loss[loss=0.08307, simple_loss=0.1035, pruned_loss=0.02121, audio_tagging_loss=0.01013, over 3045082.09 frames. ], batch size: 55, lr: 5.83e-03, grad_scale: 16.0 2023-11-20 02:23:20,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=902053.3333333334, ans=0.0 2023-11-20 02:23:56,510 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:23:56,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=902253.3333333334, ans=0.07 2023-11-20 02:23:59,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=902253.3333333334, ans=0.1 2023-11-20 02:24:01,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=902253.3333333334, ans=0.125 2023-11-20 02:24:09,907 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 135350 2023-11-20 02:24:22,101 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 3100, loss[loss=0.1018, simple_loss=0.131, pruned_loss=0.02931, audio_tagging_loss=0.007001, over 15491.00 frames. ], tot_loss[loss=0.08327, simple_loss=0.1037, pruned_loss=0.02118, audio_tagging_loss=0.01023, over 3039112.41 frames. ], batch size: 56, lr: 5.83e-03, grad_scale: 16.0 2023-11-20 02:24:39,886 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.062e+01 8.132e+01 9.002e+01 1.001e+02 1.327e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-20 02:24:51,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=902520.0, ans=0.2 2023-11-20 02:25:03,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=902586.6666666666, ans=0.125 2023-11-20 02:25:14,132 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 135400 2023-11-20 02:25:17,673 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-20 02:25:27,952 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 3150, loss[loss=0.06724, simple_loss=0.07916, pruned_loss=0.01491, audio_tagging_loss=0.01275, over 14981.00 frames. ], tot_loss[loss=0.08337, simple_loss=0.1036, pruned_loss=0.02121, audio_tagging_loss=0.01034, over 3037546.88 frames. ], batch size: 58, lr: 5.83e-03, grad_scale: 16.0 2023-11-20 02:25:43,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=902786.6666666666, ans=0.5 2023-11-20 02:26:01,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=902853.3333333334, ans=0.125 2023-11-20 02:26:04,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=902853.3333333334, ans=0.0 2023-11-20 02:26:07,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.80 vs. limit=15.0 2023-11-20 02:26:17,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=902920.0, ans=0.125 2023-11-20 02:26:20,952 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 135450 2023-11-20 02:26:23,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=902986.6666666666, ans=0.125 2023-11-20 02:26:33,306 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 3200, loss[loss=0.08514, simple_loss=0.1089, pruned_loss=0.01826, audio_tagging_loss=0.01244, over 15920.00 frames. ], tot_loss[loss=0.08355, simple_loss=0.1038, pruned_loss=0.02121, audio_tagging_loss=0.01045, over 3039762.82 frames. ], batch size: 59, lr: 5.83e-03, grad_scale: 32.0 2023-11-20 02:26:42,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=903053.3333333334, ans=0.125 2023-11-20 02:26:44,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=903053.3333333334, ans=0.1 2023-11-20 02:26:44,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=903053.3333333334, ans=0.125 2023-11-20 02:26:48,040 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.96 vs. limit=22.5 2023-11-20 02:26:49,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2023-11-20 02:26:49,827 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.506e+01 8.472e+01 8.960e+01 9.869e+01 1.272e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-20 02:26:50,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=903120.0, ans=0.09899494936611666 2023-11-20 02:26:51,761 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.23 vs. limit=22.5 2023-11-20 02:26:56,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=903120.0, ans=0.0 2023-11-20 02:27:07,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=903186.6666666666, ans=0.125 2023-11-20 02:27:25,472 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 135500 2023-11-20 02:27:29,355 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.85 vs. limit=15.0 2023-11-20 02:27:32,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=903320.0, ans=0.09899494936611666 2023-11-20 02:27:38,288 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 3250, loss[loss=0.09788, simple_loss=0.1262, pruned_loss=0.02808, audio_tagging_loss=0.006698, over 14661.00 frames. ], tot_loss[loss=0.08306, simple_loss=0.1033, pruned_loss=0.021, audio_tagging_loss=0.0104, over 3040282.52 frames. ], batch size: 54, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:27:41,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=903386.6666666666, ans=0.125 2023-11-20 02:27:43,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=903386.6666666666, ans=0.125 2023-11-20 02:28:04,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=903520.0, ans=0.125 2023-11-20 02:28:09,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=903520.0, ans=0.0 2023-11-20 02:28:10,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=903520.0, ans=0.0 2023-11-20 02:28:20,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=903586.6666666666, ans=22.5 2023-11-20 02:28:30,479 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 135550 2023-11-20 02:28:43,518 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 3300, loss[loss=0.08532, simple_loss=0.1013, pruned_loss=0.02487, audio_tagging_loss=0.009799, over 15368.00 frames. ], tot_loss[loss=0.08276, simple_loss=0.1027, pruned_loss=0.02094, audio_tagging_loss=0.01045, over 3039758.74 frames. ], batch size: 56, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:29:00,613 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.558e+01 8.302e+01 8.686e+01 9.682e+01 1.210e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-20 02:29:06,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=903786.6666666666, ans=0.125 2023-11-20 02:29:07,024 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.06 vs. limit=8.0 2023-11-20 02:29:35,792 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 135600 2023-11-20 02:29:45,928 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.24 vs. limit=22.5 2023-11-20 02:29:48,785 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 3350, loss[loss=0.06293, simple_loss=0.0766, pruned_loss=0.01119, audio_tagging_loss=0.01343, over 15835.00 frames. ], tot_loss[loss=0.08327, simple_loss=0.103, pruned_loss=0.02133, audio_tagging_loss=0.01042, over 3052605.80 frames. ], batch size: 61, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:30:07,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.37 vs. limit=15.0 2023-11-20 02:30:18,724 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=12.0 2023-11-20 02:30:20,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=904186.6666666666, ans=0.0 2023-11-20 02:30:21,222 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.46 vs. limit=12.0 2023-11-20 02:30:24,733 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.17 vs. limit=15.0 2023-11-20 02:30:26,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=904253.3333333334, ans=0.0 2023-11-20 02:30:36,749 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.71 vs. limit=15.0 2023-11-20 02:30:39,900 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 135650 2023-11-20 02:30:52,038 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.27 vs. limit=10.0 2023-11-20 02:30:52,692 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 3400, loss[loss=0.09082, simple_loss=0.1223, pruned_loss=0.01865, audio_tagging_loss=0.01103, over 15504.00 frames. ], tot_loss[loss=0.08373, simple_loss=0.104, pruned_loss=0.02136, audio_tagging_loss=0.01036, over 3058487.27 frames. ], batch size: 55, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:31:09,370 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.968e+01 8.145e+01 8.862e+01 9.550e+01 1.351e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-20 02:31:18,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=904520.0, ans=0.0 2023-11-20 02:31:42,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=904586.6666666666, ans=0.125 2023-11-20 02:31:44,809 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 135700 2023-11-20 02:31:50,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=904653.3333333334, ans=10.0 2023-11-20 02:31:52,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=904653.3333333334, ans=0.125 2023-11-20 02:31:57,665 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 3450, loss[loss=0.06606, simple_loss=0.08606, pruned_loss=0.01576, audio_tagging_loss=0.007267, over 15192.00 frames. ], tot_loss[loss=0.08293, simple_loss=0.103, pruned_loss=0.0211, audio_tagging_loss=0.01034, over 3050059.92 frames. ], batch size: 58, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:32:13,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=904786.6666666666, ans=0.0 2023-11-20 02:32:16,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=904786.6666666666, ans=0.125 2023-11-20 02:32:45,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=904920.0, ans=15.0 2023-11-20 02:32:49,689 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 135750 2023-11-20 02:33:02,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=905053.3333333334, ans=22.5 2023-11-20 02:33:03,101 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 3500, loss[loss=0.08115, simple_loss=0.1018, pruned_loss=0.02028, audio_tagging_loss=0.009977, over 15469.00 frames. ], tot_loss[loss=0.08285, simple_loss=0.1031, pruned_loss=0.02109, audio_tagging_loss=0.01024, over 3039726.15 frames. ], batch size: 58, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:33:19,502 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.042e+01 8.771e+01 9.579e+01 1.310e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 02:33:26,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=905120.0, ans=0.125 2023-11-20 02:33:31,481 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.41 vs. limit=22.5 2023-11-20 02:33:36,456 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:33:36,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=905186.6666666666, ans=0.2 2023-11-20 02:33:50,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=905253.3333333334, ans=0.125 2023-11-20 02:33:55,116 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 135800 2023-11-20 02:34:02,817 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.40 vs. limit=22.5 2023-11-20 02:34:08,287 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 3550, loss[loss=0.1163, simple_loss=0.143, pruned_loss=0.03685, audio_tagging_loss=0.007922, over 14949.00 frames. ], tot_loss[loss=0.08277, simple_loss=0.1028, pruned_loss=0.02119, audio_tagging_loss=0.0102, over 3044872.82 frames. ], batch size: 54, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:34:41,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=905520.0, ans=0.125 2023-11-20 02:34:59,137 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=12.0 2023-11-20 02:34:59,932 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 135850 2023-11-20 02:35:12,746 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 3600, loss[loss=0.07343, simple_loss=0.09133, pruned_loss=0.01421, audio_tagging_loss=0.01356, over 16473.00 frames. ], tot_loss[loss=0.08286, simple_loss=0.1032, pruned_loss=0.02115, audio_tagging_loss=0.01013, over 3047603.69 frames. ], batch size: 62, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:35:29,264 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.874e+01 8.123e+01 9.173e+01 1.010e+02 1.525e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-20 02:35:35,494 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.90 vs. limit=10.0 2023-11-20 02:35:46,796 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:36:00,919 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.06 vs. limit=10.0 2023-11-20 02:36:04,310 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 135900 2023-11-20 02:36:07,264 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=12.0 2023-11-20 02:36:17,250 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 3650, loss[loss=0.075, simple_loss=0.08916, pruned_loss=0.01682, audio_tagging_loss=0.01359, over 14933.00 frames. ], tot_loss[loss=0.08202, simple_loss=0.1021, pruned_loss=0.02088, audio_tagging_loss=0.01009, over 3045439.81 frames. ], batch size: 57, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:36:24,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=906053.3333333334, ans=0.125 2023-11-20 02:36:36,385 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2023-11-20 02:36:54,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=906186.6666666666, ans=0.125 2023-11-20 02:36:54,563 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2023-11-20 02:37:01,668 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.63 vs. limit=22.5 2023-11-20 02:37:09,386 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 135950 2023-11-20 02:37:13,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=906320.0, ans=0.0 2023-11-20 02:37:19,678 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.48 vs. limit=6.0 2023-11-20 02:37:22,746 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 3700, loss[loss=0.113, simple_loss=0.1573, pruned_loss=0.02815, audio_tagging_loss=0.006208, over 15381.00 frames. ], tot_loss[loss=0.08223, simple_loss=0.1021, pruned_loss=0.02105, audio_tagging_loss=0.01011, over 3052910.46 frames. ], batch size: 55, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:37:36,754 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2023-11-20 02:37:38,492 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.777e+01 8.262e+01 8.837e+01 9.420e+01 1.280e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-20 02:37:38,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=906453.3333333334, ans=0.95 2023-11-20 02:37:38,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=906453.3333333334, ans=0.125 2023-11-20 02:37:46,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=906520.0, ans=0.0 2023-11-20 02:37:47,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=906520.0, ans=0.125 2023-11-20 02:37:55,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=906520.0, ans=0.0 2023-11-20 02:38:09,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=906586.6666666666, ans=0.0 2023-11-20 02:38:13,895 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 136000 2023-11-20 02:38:29,424 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 3750, loss[loss=0.1017, simple_loss=0.1283, pruned_loss=0.02785, audio_tagging_loss=0.00975, over 15812.00 frames. ], tot_loss[loss=0.08344, simple_loss=0.1038, pruned_loss=0.02149, audio_tagging_loss=0.01003, over 3053933.12 frames. ], batch size: 59, lr: 5.81e-03, grad_scale: 16.0 2023-11-20 02:38:32,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=906720.0, ans=0.07 2023-11-20 02:38:33,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=906720.0, ans=0.2 2023-11-20 02:38:53,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=906786.6666666666, ans=10.0 2023-11-20 02:39:03,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=906853.3333333334, ans=0.125 2023-11-20 02:39:05,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=906853.3333333334, ans=0.0 2023-11-20 02:39:16,265 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:39:16,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=906920.0, ans=0.0 2023-11-20 02:39:21,896 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 136050 2023-11-20 02:39:34,595 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 3800, loss[loss=0.08289, simple_loss=0.1072, pruned_loss=0.01837, audio_tagging_loss=0.01092, over 15176.00 frames. ], tot_loss[loss=0.08278, simple_loss=0.1025, pruned_loss=0.02135, audio_tagging_loss=0.01016, over 3046064.56 frames. ], batch size: 56, lr: 5.81e-03, grad_scale: 16.0 2023-11-20 02:39:35,293 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.04 vs. limit=15.0 2023-11-20 02:39:43,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=907053.3333333334, ans=0.125 2023-11-20 02:39:50,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=907120.0, ans=0.1 2023-11-20 02:39:52,986 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.871e+01 8.426e+01 8.980e+01 9.690e+01 1.284e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-20 02:40:02,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=907186.6666666666, ans=0.125 2023-11-20 02:40:15,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=907253.3333333334, ans=0.1 2023-11-20 02:40:26,345 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 136100 2023-11-20 02:40:39,674 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 3850, loss[loss=0.06852, simple_loss=0.08569, pruned_loss=0.016, audio_tagging_loss=0.009666, over 14944.00 frames. ], tot_loss[loss=0.08239, simple_loss=0.1021, pruned_loss=0.02099, audio_tagging_loss=0.01036, over 3053479.67 frames. ], batch size: 58, lr: 5.81e-03, grad_scale: 16.0 2023-11-20 02:40:46,156 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:41:07,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=907520.0, ans=0.1 2023-11-20 02:41:07,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=907520.0, ans=0.1 2023-11-20 02:41:19,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=907586.6666666666, ans=0.125 2023-11-20 02:41:31,402 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 136150 2023-11-20 02:41:39,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=907653.3333333334, ans=0.0 2023-11-20 02:41:41,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=907653.3333333334, ans=0.125 2023-11-20 02:41:43,656 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 3900, loss[loss=0.05868, simple_loss=0.06787, pruned_loss=0.01418, audio_tagging_loss=0.01056, over 16412.00 frames. ], tot_loss[loss=0.08242, simple_loss=0.102, pruned_loss=0.02101, audio_tagging_loss=0.01041, over 3054901.51 frames. ], batch size: 66, lr: 5.81e-03, grad_scale: 16.0 2023-11-20 02:41:54,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=907720.0, ans=0.125 2023-11-20 02:42:02,320 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.823e+01 8.198e+01 8.910e+01 9.669e+01 1.262e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-20 02:42:07,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=907786.6666666666, ans=0.125 2023-11-20 02:42:14,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=907853.3333333334, ans=0.125 2023-11-20 02:42:18,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=907853.3333333334, ans=0.125 2023-11-20 02:42:22,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=907920.0, ans=0.035 2023-11-20 02:42:28,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=907920.0, ans=0.2 2023-11-20 02:42:35,724 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 136200 2023-11-20 02:42:49,272 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 3950, loss[loss=0.08692, simple_loss=0.1205, pruned_loss=0.02051, audio_tagging_loss=0.006142, over 15029.00 frames. ], tot_loss[loss=0.08193, simple_loss=0.1018, pruned_loss=0.02065, audio_tagging_loss=0.01037, over 3051598.87 frames. ], batch size: 55, lr: 5.81e-03, grad_scale: 16.0 2023-11-20 02:42:51,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=908053.3333333334, ans=0.125 2023-11-20 02:43:07,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=908120.0, ans=0.1 2023-11-20 02:43:10,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=908120.0, ans=0.125 2023-11-20 02:43:19,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=908186.6666666666, ans=0.125 2023-11-20 02:43:24,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=908186.6666666666, ans=0.125 2023-11-20 02:43:29,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=908253.3333333334, ans=0.1 2023-11-20 02:43:34,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=908253.3333333334, ans=0.1 2023-11-20 02:43:40,669 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 136250 2023-11-20 02:43:41,097 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2023-11-20 02:43:49,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=908320.0, ans=0.125 2023-11-20 02:43:52,744 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 4000, loss[loss=0.07671, simple_loss=0.1023, pruned_loss=0.01561, audio_tagging_loss=0.009956, over 14835.00 frames. ], tot_loss[loss=0.08147, simple_loss=0.1011, pruned_loss=0.02045, audio_tagging_loss=0.01045, over 3044580.99 frames. ], batch size: 56, lr: 5.81e-03, grad_scale: 32.0 2023-11-20 02:44:11,155 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.441e+01 8.168e+01 8.877e+01 9.706e+01 2.567e+02, threshold=1.775e+02, percent-clipped=1.0 2023-11-20 02:44:22,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=908520.0, ans=0.0 2023-11-20 02:44:25,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=908520.0, ans=0.2 2023-11-20 02:44:44,759 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 136300 2023-11-20 02:44:57,475 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 4050, loss[loss=0.06715, simple_loss=0.07221, pruned_loss=0.01819, audio_tagging_loss=0.01286, over 14425.00 frames. ], tot_loss[loss=0.0817, simple_loss=0.1014, pruned_loss=0.02057, audio_tagging_loss=0.01041, over 3037640.27 frames. ], batch size: 56, lr: 5.81e-03, grad_scale: 32.0 2023-11-20 02:44:57,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=908720.0, ans=0.035 2023-11-20 02:45:01,199 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:45:05,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=908720.0, ans=0.07 2023-11-20 02:45:10,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=908786.6666666666, ans=0.125 2023-11-20 02:45:11,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=908786.6666666666, ans=0.125 2023-11-20 02:45:27,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=908853.3333333334, ans=0.125 2023-11-20 02:45:46,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=908920.0, ans=0.125 2023-11-20 02:45:49,255 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 136350 2023-11-20 02:46:01,979 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 4100, loss[loss=0.07368, simple_loss=0.09275, pruned_loss=0.01423, audio_tagging_loss=0.01307, over 14854.00 frames. ], tot_loss[loss=0.08184, simple_loss=0.1016, pruned_loss=0.02064, audio_tagging_loss=0.01039, over 3035208.21 frames. ], batch size: 57, lr: 5.81e-03, grad_scale: 32.0 2023-11-20 02:46:17,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=909120.0, ans=0.07 2023-11-20 02:46:19,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=909120.0, ans=0.1 2023-11-20 02:46:19,907 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.593e+01 8.318e+01 8.893e+01 9.801e+01 1.256e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 02:46:46,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=12.0 2023-11-20 02:46:50,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=909253.3333333334, ans=0.0 2023-11-20 02:46:54,596 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 136400 2023-11-20 02:47:07,346 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 4150, loss[loss=0.05346, simple_loss=0.06532, pruned_loss=0.01187, audio_tagging_loss=0.008921, over 14537.00 frames. ], tot_loss[loss=0.08201, simple_loss=0.102, pruned_loss=0.02074, audio_tagging_loss=0.01025, over 3048080.92 frames. ], batch size: 54, lr: 5.81e-03, grad_scale: 32.0 2023-11-20 02:47:14,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=909386.6666666666, ans=0.0 2023-11-20 02:47:15,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=909386.6666666666, ans=0.0 2023-11-20 02:47:34,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=909520.0, ans=0.1 2023-11-20 02:47:44,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=909520.0, ans=0.0 2023-11-20 02:47:55,227 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:47:59,140 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 136450 2023-11-20 02:48:05,185 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.98 vs. limit=15.0 2023-11-20 02:48:05,980 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:48:12,040 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 4200, loss[loss=0.08349, simple_loss=0.09841, pruned_loss=0.0212, audio_tagging_loss=0.01308, over 15159.00 frames. ], tot_loss[loss=0.08264, simple_loss=0.103, pruned_loss=0.02096, audio_tagging_loss=0.01016, over 3050723.00 frames. ], batch size: 57, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:48:14,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=909720.0, ans=0.125 2023-11-20 02:48:30,385 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.105e+01 8.354e+01 9.339e+01 1.014e+02 1.353e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-20 02:48:52,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=909920.0, ans=0.125 2023-11-20 02:49:04,000 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 136500 2023-11-20 02:49:16,882 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 4250, loss[loss=0.09751, simple_loss=0.1244, pruned_loss=0.02716, audio_tagging_loss=0.008142, over 15467.00 frames. ], tot_loss[loss=0.0829, simple_loss=0.1033, pruned_loss=0.0211, audio_tagging_loss=0.01016, over 3047935.92 frames. ], batch size: 56, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:49:18,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=910053.3333333334, ans=0.2 2023-11-20 02:49:32,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=910120.0, ans=0.2 2023-11-20 02:49:34,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=910120.0, ans=0.1 2023-11-20 02:49:36,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=910120.0, ans=0.0 2023-11-20 02:49:59,489 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:50:03,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=910253.3333333334, ans=0.2 2023-11-20 02:50:08,486 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 136550 2023-11-20 02:50:08,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=910320.0, ans=0.02 2023-11-20 02:50:21,292 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 4300, loss[loss=0.09086, simple_loss=0.1119, pruned_loss=0.02425, audio_tagging_loss=0.01065, over 14120.00 frames. ], tot_loss[loss=0.08284, simple_loss=0.1032, pruned_loss=0.02109, audio_tagging_loss=0.01013, over 3043182.32 frames. ], batch size: 55, lr: 5.80e-03, grad_scale: 16.0 2023-11-20 02:50:28,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=910386.6666666666, ans=0.125 2023-11-20 02:50:31,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=910386.6666666666, ans=0.1 2023-11-20 02:50:35,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=910453.3333333334, ans=0.125 2023-11-20 02:50:37,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=910453.3333333334, ans=0.0 2023-11-20 02:50:40,213 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.215e+01 8.438e+01 9.081e+01 9.991e+01 1.404e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-20 02:51:01,179 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.13 vs. limit=15.0 2023-11-20 02:51:09,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=910586.6666666666, ans=0.1 2023-11-20 02:51:12,550 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 136600 2023-11-20 02:51:18,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=910653.3333333334, ans=0.125 2023-11-20 02:51:21,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=910653.3333333334, ans=0.0 2023-11-20 02:51:25,598 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 4350, loss[loss=0.06447, simple_loss=0.08101, pruned_loss=0.01429, audio_tagging_loss=0.009679, over 15296.00 frames. ], tot_loss[loss=0.08262, simple_loss=0.103, pruned_loss=0.02102, audio_tagging_loss=0.01009, over 3044701.96 frames. ], batch size: 57, lr: 5.80e-03, grad_scale: 16.0 2023-11-20 02:52:10,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=910920.0, ans=0.125 2023-11-20 02:52:13,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=910920.0, ans=0.1 2023-11-20 02:52:15,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=910986.6666666666, ans=0.0 2023-11-20 02:52:17,112 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 136650 2023-11-20 02:52:23,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=910986.6666666666, ans=0.1 2023-11-20 02:52:30,331 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 4400, loss[loss=0.05883, simple_loss=0.06584, pruned_loss=0.01289, audio_tagging_loss=0.01302, over 14438.00 frames. ], tot_loss[loss=0.08191, simple_loss=0.1021, pruned_loss=0.02083, audio_tagging_loss=0.01002, over 3039107.83 frames. ], batch size: 55, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:52:49,548 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.680e+01 7.985e+01 8.679e+01 9.460e+01 1.350e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-20 02:53:21,550 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 136700 2023-11-20 02:53:24,206 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.687e-03 2023-11-20 02:53:34,332 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 4450, loss[loss=0.09993, simple_loss=0.1447, pruned_loss=0.01946, audio_tagging_loss=0.00812, over 16063.00 frames. ], tot_loss[loss=0.08229, simple_loss=0.1027, pruned_loss=0.02099, audio_tagging_loss=0.009959, over 3045786.64 frames. ], batch size: 57, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:53:37,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=911386.6666666666, ans=0.0 2023-11-20 02:53:59,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=911520.0, ans=0.0 2023-11-20 02:54:19,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=911586.6666666666, ans=0.125 2023-11-20 02:54:20,214 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.14 vs. limit=12.0 2023-11-20 02:54:23,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=911586.6666666666, ans=0.125 2023-11-20 02:54:25,564 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 136750 2023-11-20 02:54:28,464 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2023-11-20 02:54:33,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=911653.3333333334, ans=0.1 2023-11-20 02:54:38,263 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 4500, loss[loss=0.1071, simple_loss=0.1284, pruned_loss=0.03434, audio_tagging_loss=0.008546, over 15169.00 frames. ], tot_loss[loss=0.08231, simple_loss=0.1027, pruned_loss=0.02109, audio_tagging_loss=0.00988, over 3051201.78 frames. ], batch size: 56, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:54:50,925 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:54:57,918 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.729e+01 8.079e+01 8.708e+01 9.513e+01 1.325e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 02:55:28,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=911986.6666666666, ans=0.1 2023-11-20 02:55:29,806 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 136800 2023-11-20 02:55:36,732 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.59 vs. limit=15.0 2023-11-20 02:55:43,014 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 4550, loss[loss=0.05667, simple_loss=0.06451, pruned_loss=0.01385, audio_tagging_loss=0.01056, over 15000.00 frames. ], tot_loss[loss=0.08256, simple_loss=0.103, pruned_loss=0.02117, audio_tagging_loss=0.009903, over 3050939.26 frames. ], batch size: 59, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:55:47,407 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.69 vs. limit=15.0 2023-11-20 02:56:11,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=912186.6666666666, ans=0.1 2023-11-20 02:56:21,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=912253.3333333334, ans=0.125 2023-11-20 02:56:29,883 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.590e-03 2023-11-20 02:56:32,708 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:56:35,212 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 136850 2023-11-20 02:56:42,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=912320.0, ans=0.5 2023-11-20 02:56:47,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=912386.6666666666, ans=0.125 2023-11-20 02:56:48,537 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 4600, loss[loss=0.08403, simple_loss=0.1012, pruned_loss=0.02351, audio_tagging_loss=0.00991, over 14957.00 frames. ], tot_loss[loss=0.08214, simple_loss=0.1019, pruned_loss=0.02111, audio_tagging_loss=0.01007, over 3054167.35 frames. ], batch size: 58, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:56:53,179 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=22.5 2023-11-20 02:57:07,911 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.584e+01 7.967e+01 8.569e+01 9.519e+01 1.814e+02, threshold=1.714e+02, percent-clipped=1.0 2023-11-20 02:57:41,174 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 136900 2023-11-20 02:57:41,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=912653.3333333334, ans=0.125 2023-11-20 02:57:41,590 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=15.0 2023-11-20 02:57:53,880 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 4650, loss[loss=0.06998, simple_loss=0.08517, pruned_loss=0.01631, audio_tagging_loss=0.01108, over 14917.00 frames. ], tot_loss[loss=0.08202, simple_loss=0.102, pruned_loss=0.02089, audio_tagging_loss=0.01013, over 3053608.03 frames. ], batch size: 57, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:58:21,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=912853.3333333334, ans=0.2 2023-11-20 02:58:32,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=912920.0, ans=0.125 2023-11-20 02:58:37,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.37 vs. limit=15.0 2023-11-20 02:58:46,001 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 136950 2023-11-20 02:58:58,302 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 4700, loss[loss=0.1036, simple_loss=0.1318, pruned_loss=0.02959, audio_tagging_loss=0.008047, over 14457.00 frames. ], tot_loss[loss=0.08263, simple_loss=0.1027, pruned_loss=0.02107, audio_tagging_loss=0.0102, over 3045597.89 frames. ], batch size: 51, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 02:59:02,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=12.0 2023-11-20 02:59:06,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=913053.3333333334, ans=0.125 2023-11-20 02:59:17,791 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.469e+01 8.087e+01 8.546e+01 9.453e+01 1.405e+02, threshold=1.709e+02, percent-clipped=0.0 2023-11-20 02:59:20,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=913120.0, ans=0.0 2023-11-20 02:59:32,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=913186.6666666666, ans=0.0 2023-11-20 02:59:49,520 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 137000 2023-11-20 03:00:02,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=913386.6666666666, ans=0.125 2023-11-20 03:00:03,313 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 4750, loss[loss=0.06769, simple_loss=0.07342, pruned_loss=0.01723, audio_tagging_loss=0.01374, over 15126.00 frames. ], tot_loss[loss=0.08263, simple_loss=0.1026, pruned_loss=0.02105, audio_tagging_loss=0.01026, over 3041459.55 frames. ], batch size: 62, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:00:21,738 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.29 vs. limit=10.0 2023-11-20 03:00:28,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-20 03:00:49,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=913586.6666666666, ans=0.125 2023-11-20 03:00:54,755 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 137050 2023-11-20 03:00:57,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=913653.3333333334, ans=0.0 2023-11-20 03:01:07,346 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 4800, loss[loss=0.09583, simple_loss=0.1215, pruned_loss=0.02462, audio_tagging_loss=0.01044, over 15547.00 frames. ], tot_loss[loss=0.08251, simple_loss=0.1023, pruned_loss=0.02094, audio_tagging_loss=0.01042, over 3047363.03 frames. ], batch size: 57, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:01:11,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=913720.0, ans=0.125 2023-11-20 03:01:25,564 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.051e+01 8.703e+01 9.476e+01 1.263e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 03:01:29,186 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=22.5 2023-11-20 03:01:35,561 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2023-11-20 03:01:39,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=913853.3333333334, ans=10.0 2023-11-20 03:01:49,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=913920.0, ans=0.1 2023-11-20 03:01:52,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=913920.0, ans=0.1 2023-11-20 03:01:55,581 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2023-11-20 03:01:58,905 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 137100 2023-11-20 03:02:01,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=913986.6666666666, ans=0.09899494936611666 2023-11-20 03:02:11,291 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 4850, loss[loss=0.0624, simple_loss=0.07297, pruned_loss=0.01478, audio_tagging_loss=0.01114, over 14842.00 frames. ], tot_loss[loss=0.08291, simple_loss=0.1028, pruned_loss=0.0211, audio_tagging_loss=0.01042, over 3045933.62 frames. ], batch size: 55, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:02:15,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=914053.3333333334, ans=0.125 2023-11-20 03:02:23,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=914120.0, ans=0.125 2023-11-20 03:02:56,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=914253.3333333334, ans=0.025 2023-11-20 03:03:02,563 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 137150 2023-11-20 03:03:09,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=914320.0, ans=0.1 2023-11-20 03:03:11,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=914320.0, ans=0.2 2023-11-20 03:03:12,252 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.12 vs. limit=15.0 2023-11-20 03:03:15,880 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 4900, loss[loss=0.06521, simple_loss=0.07597, pruned_loss=0.01464, audio_tagging_loss=0.01259, over 15101.00 frames. ], tot_loss[loss=0.08326, simple_loss=0.1038, pruned_loss=0.02109, audio_tagging_loss=0.01028, over 3051255.54 frames. ], batch size: 58, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:03:18,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=914386.6666666666, ans=0.1 2023-11-20 03:03:20,878 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:03:26,880 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.81 vs. limit=22.5 2023-11-20 03:03:35,167 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.551e+01 8.052e+01 8.825e+01 9.558e+01 1.326e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-20 03:03:54,946 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.98 vs. limit=12.0 2023-11-20 03:04:07,017 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 137200 2023-11-20 03:04:20,330 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 4950, loss[loss=0.06125, simple_loss=0.07367, pruned_loss=0.01494, audio_tagging_loss=0.009475, over 15219.00 frames. ], tot_loss[loss=0.08276, simple_loss=0.1031, pruned_loss=0.02103, audio_tagging_loss=0.01017, over 3048928.77 frames. ], batch size: 58, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:04:34,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=914786.6666666666, ans=0.05 2023-11-20 03:04:51,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=914853.3333333334, ans=0.125 2023-11-20 03:05:12,451 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 137250 2023-11-20 03:05:19,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=914986.6666666666, ans=0.05 2023-11-20 03:05:20,064 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.85 vs. limit=6.0 2023-11-20 03:05:24,397 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 5000, loss[loss=0.0725, simple_loss=0.09347, pruned_loss=0.01718, audio_tagging_loss=0.008586, over 15960.00 frames. ], tot_loss[loss=0.08267, simple_loss=0.1031, pruned_loss=0.02103, audio_tagging_loss=0.01007, over 3053581.92 frames. ], batch size: 59, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:05:43,925 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.699e+01 7.904e+01 8.718e+01 9.618e+01 1.428e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-20 03:05:45,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=915120.0, ans=0.035 2023-11-20 03:05:49,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=915186.6666666666, ans=0.125 2023-11-20 03:06:07,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=915253.3333333334, ans=0.2 2023-11-20 03:06:15,470 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 137300 2023-11-20 03:06:27,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=915386.6666666666, ans=0.1 2023-11-20 03:06:27,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=915386.6666666666, ans=0.125 2023-11-20 03:06:28,275 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 5050, loss[loss=0.0821, simple_loss=0.1043, pruned_loss=0.02192, audio_tagging_loss=0.00802, over 16110.00 frames. ], tot_loss[loss=0.08161, simple_loss=0.1018, pruned_loss=0.02057, audio_tagging_loss=0.01015, over 3054013.77 frames. ], batch size: 60, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:06:37,996 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.32 vs. limit=15.0 2023-11-20 03:06:41,762 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.50 vs. limit=10.0 2023-11-20 03:06:57,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=915520.0, ans=0.125 2023-11-20 03:07:20,304 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 137350 2023-11-20 03:07:23,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=915653.3333333334, ans=0.1 2023-11-20 03:07:32,318 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 5100, loss[loss=0.07878, simple_loss=0.09723, pruned_loss=0.02053, audio_tagging_loss=0.009629, over 14969.00 frames. ], tot_loss[loss=0.08154, simple_loss=0.1017, pruned_loss=0.02063, audio_tagging_loss=0.01004, over 3047454.36 frames. ], batch size: 57, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:07:51,980 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.683e+01 7.866e+01 8.491e+01 9.250e+01 1.522e+02, threshold=1.698e+02, percent-clipped=0.0 2023-11-20 03:07:55,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=915786.6666666666, ans=0.2 2023-11-20 03:07:57,478 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=15.0 2023-11-20 03:07:57,770 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.18 vs. limit=15.0 2023-11-20 03:08:02,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=915853.3333333334, ans=0.0 2023-11-20 03:08:04,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=915853.3333333334, ans=0.1 2023-11-20 03:08:08,034 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2023-11-20 03:08:23,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=915986.6666666666, ans=0.2 2023-11-20 03:08:24,146 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 137400 2023-11-20 03:08:27,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=915986.6666666666, ans=0.125 2023-11-20 03:08:37,506 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 5150, loss[loss=0.09848, simple_loss=0.1203, pruned_loss=0.02922, audio_tagging_loss=0.009134, over 15411.00 frames. ], tot_loss[loss=0.08109, simple_loss=0.1011, pruned_loss=0.02051, audio_tagging_loss=0.01005, over 3050261.80 frames. ], batch size: 59, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:08:39,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=916053.3333333334, ans=0.2 2023-11-20 03:09:18,931 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.43 vs. limit=15.0 2023-11-20 03:09:24,908 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:09:28,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.77 vs. limit=15.0 2023-11-20 03:09:29,567 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 137450 2023-11-20 03:09:29,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.87 vs. limit=15.0 2023-11-20 03:09:42,214 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 5200, loss[loss=0.08934, simple_loss=0.1129, pruned_loss=0.02521, audio_tagging_loss=0.007672, over 16047.00 frames. ], tot_loss[loss=0.08041, simple_loss=0.1002, pruned_loss=0.02019, audio_tagging_loss=0.01013, over 3052985.54 frames. ], batch size: 57, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:09:56,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=916453.3333333334, ans=0.125 2023-11-20 03:10:01,273 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.268e+01 8.182e+01 8.792e+01 9.724e+01 1.387e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-20 03:10:04,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=916453.3333333334, ans=0.125 2023-11-20 03:10:12,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=916520.0, ans=0.125 2023-11-20 03:10:15,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=916520.0, ans=0.125 2023-11-20 03:10:15,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=916520.0, ans=0.125 2023-11-20 03:10:34,169 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 137500 2023-11-20 03:10:46,715 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 5250, loss[loss=0.08933, simple_loss=0.1158, pruned_loss=0.02358, audio_tagging_loss=0.007831, over 15301.00 frames. ], tot_loss[loss=0.08049, simple_loss=0.1004, pruned_loss=0.02028, audio_tagging_loss=0.01004, over 3047876.38 frames. ], batch size: 57, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:10:46,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=916720.0, ans=0.0 2023-11-20 03:10:49,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=916720.0, ans=0.125 2023-11-20 03:11:21,038 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-11-20 03:11:26,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=916920.0, ans=0.2 2023-11-20 03:11:38,229 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 137550 2023-11-20 03:11:48,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=916986.6666666666, ans=0.125 2023-11-20 03:11:51,372 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 5300, loss[loss=0.1046, simple_loss=0.1297, pruned_loss=0.03053, audio_tagging_loss=0.009174, over 15265.00 frames. ], tot_loss[loss=0.08178, simple_loss=0.1022, pruned_loss=0.02083, audio_tagging_loss=0.009881, over 3044126.62 frames. ], batch size: 57, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:12:10,533 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.799e+01 8.318e+01 9.072e+01 9.912e+01 1.242e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-20 03:12:19,256 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2023-11-20 03:12:30,853 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.48 vs. limit=15.0 2023-11-20 03:12:35,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=917253.3333333334, ans=0.125 2023-11-20 03:12:43,167 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 137600 2023-11-20 03:12:56,061 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 5350, loss[loss=0.06751, simple_loss=0.08154, pruned_loss=0.01588, audio_tagging_loss=0.01086, over 15211.00 frames. ], tot_loss[loss=0.08277, simple_loss=0.1036, pruned_loss=0.02117, audio_tagging_loss=0.009786, over 3052140.37 frames. ], batch size: 56, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:12:56,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=917386.6666666666, ans=0.1 2023-11-20 03:12:57,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=917386.6666666666, ans=0.04949747468305833 2023-11-20 03:13:01,679 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=15.0 2023-11-20 03:13:24,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=917520.0, ans=0.125 2023-11-20 03:13:24,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=917520.0, ans=0.0 2023-11-20 03:13:28,655 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.36 vs. limit=10.0 2023-11-20 03:13:42,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=917586.6666666666, ans=0.07 2023-11-20 03:13:42,950 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.89 vs. limit=10.0 2023-11-20 03:13:47,655 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 137650 2023-11-20 03:14:00,423 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 5400, loss[loss=0.06517, simple_loss=0.07583, pruned_loss=0.01305, audio_tagging_loss=0.0142, over 14732.00 frames. ], tot_loss[loss=0.08273, simple_loss=0.1037, pruned_loss=0.02108, audio_tagging_loss=0.009784, over 3052167.19 frames. ], batch size: 55, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:14:05,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=917720.0, ans=0.125 2023-11-20 03:14:19,173 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.643e+01 8.272e+01 8.874e+01 9.617e+01 1.716e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 03:14:29,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=917853.3333333334, ans=0.0 2023-11-20 03:14:30,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=917853.3333333334, ans=0.1 2023-11-20 03:14:44,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=917920.0, ans=0.0 2023-11-20 03:14:51,533 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 137700 2023-11-20 03:14:58,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=917986.6666666666, ans=0.125 2023-11-20 03:15:04,144 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 5450, loss[loss=0.1027, simple_loss=0.1277, pruned_loss=0.0299, audio_tagging_loss=0.008981, over 16136.00 frames. ], tot_loss[loss=0.08308, simple_loss=0.1038, pruned_loss=0.02129, audio_tagging_loss=0.009898, over 3054590.16 frames. ], batch size: 58, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:15:04,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=918053.3333333334, ans=0.0 2023-11-20 03:15:05,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=918053.3333333334, ans=0.125 2023-11-20 03:15:14,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=918053.3333333334, ans=0.1 2023-11-20 03:15:15,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=918120.0, ans=0.1 2023-11-20 03:15:17,381 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.64 vs. limit=15.0 2023-11-20 03:15:20,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=918120.0, ans=0.1 2023-11-20 03:15:23,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=918120.0, ans=0.125 2023-11-20 03:15:37,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=918186.6666666666, ans=0.0 2023-11-20 03:15:55,376 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 137750 2023-11-20 03:15:56,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=918320.0, ans=0.125 2023-11-20 03:16:08,295 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 5500, loss[loss=0.09889, simple_loss=0.1257, pruned_loss=0.02923, audio_tagging_loss=0.006812, over 16039.00 frames. ], tot_loss[loss=0.08346, simple_loss=0.1041, pruned_loss=0.02142, audio_tagging_loss=0.00999, over 3049423.84 frames. ], batch size: 58, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:16:22,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=918453.3333333334, ans=0.125 2023-11-20 03:16:27,776 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.847e+01 8.238e+01 9.064e+01 9.896e+01 2.099e+02, threshold=1.813e+02, percent-clipped=1.0 2023-11-20 03:16:40,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=918520.0, ans=0.125 2023-11-20 03:16:46,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=918586.6666666666, ans=0.2 2023-11-20 03:16:58,688 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:17:00,391 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 137800 2023-11-20 03:17:09,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=918653.3333333334, ans=0.1 2023-11-20 03:17:13,433 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 5550, loss[loss=0.09273, simple_loss=0.1108, pruned_loss=0.02636, audio_tagging_loss=0.01096, over 16077.00 frames. ], tot_loss[loss=0.08331, simple_loss=0.1039, pruned_loss=0.02124, audio_tagging_loss=0.01011, over 3052916.04 frames. ], batch size: 58, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:17:55,966 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=12.0 2023-11-20 03:18:04,352 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 137850 2023-11-20 03:18:08,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=918986.6666666666, ans=0.1 2023-11-20 03:18:16,841 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 5600, loss[loss=0.08604, simple_loss=0.1016, pruned_loss=0.02438, audio_tagging_loss=0.01087, over 15933.00 frames. ], tot_loss[loss=0.08322, simple_loss=0.1036, pruned_loss=0.02123, audio_tagging_loss=0.01019, over 3050434.35 frames. ], batch size: 59, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:18:34,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=919120.0, ans=0.125 2023-11-20 03:18:34,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=919120.0, ans=0.1 2023-11-20 03:18:35,343 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.491e+01 8.234e+01 9.061e+01 1.016e+02 1.381e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-20 03:18:40,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=919120.0, ans=0.125 2023-11-20 03:19:02,530 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 03:19:03,056 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2023-11-20 03:19:05,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=919253.3333333334, ans=0.1 2023-11-20 03:19:07,341 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 137900 2023-11-20 03:19:19,211 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 5650, loss[loss=0.07335, simple_loss=0.09899, pruned_loss=0.0157, audio_tagging_loss=0.008151, over 14531.00 frames. ], tot_loss[loss=0.08315, simple_loss=0.1034, pruned_loss=0.02113, audio_tagging_loss=0.01032, over 3053578.14 frames. ], batch size: 53, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:19:19,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=919386.6666666666, ans=0.2 2023-11-20 03:19:19,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=919386.6666666666, ans=0.1 2023-11-20 03:19:28,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=919386.6666666666, ans=0.125 2023-11-20 03:19:31,319 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.49 vs. limit=22.5 2023-11-20 03:19:56,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=919586.6666666666, ans=0.125 2023-11-20 03:19:56,997 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2023-11-20 03:20:07,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=919586.6666666666, ans=0.05 2023-11-20 03:20:09,957 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 137950 2023-11-20 03:20:19,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=919653.3333333334, ans=0.125 2023-11-20 03:20:23,358 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 5700, loss[loss=0.09914, simple_loss=0.1214, pruned_loss=0.0266, audio_tagging_loss=0.01183, over 15311.00 frames. ], tot_loss[loss=0.0823, simple_loss=0.1023, pruned_loss=0.02078, audio_tagging_loss=0.01036, over 3051243.90 frames. ], batch size: 60, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:20:42,402 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.375e+01 8.900e+01 9.766e+01 1.297e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-20 03:20:50,553 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2023-11-20 03:20:54,178 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.13 vs. limit=15.0 2023-11-20 03:21:06,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=919920.0, ans=0.125 2023-11-20 03:21:09,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=919920.0, ans=0.125 2023-11-20 03:21:15,175 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 138000 2023-11-20 03:21:18,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=919986.6666666666, ans=0.015 2023-11-20 03:21:19,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=919986.6666666666, ans=0.125 2023-11-20 03:21:26,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=919986.6666666666, ans=0.1 2023-11-20 03:21:28,273 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 5750, loss[loss=0.08662, simple_loss=0.1032, pruned_loss=0.02368, audio_tagging_loss=0.01136, over 16235.00 frames. ], tot_loss[loss=0.08196, simple_loss=0.1019, pruned_loss=0.02071, audio_tagging_loss=0.01029, over 3051782.36 frames. ], batch size: 62, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:22:19,721 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 138050 2023-11-20 03:22:31,823 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 5800, loss[loss=0.09481, simple_loss=0.1219, pruned_loss=0.026, audio_tagging_loss=0.007847, over 15252.00 frames. ], tot_loss[loss=0.08178, simple_loss=0.1017, pruned_loss=0.02082, audio_tagging_loss=0.01011, over 3047774.01 frames. ], batch size: 56, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:22:52,123 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.779e+01 8.315e+01 8.891e+01 9.653e+01 1.172e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-20 03:22:54,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=920453.3333333334, ans=0.0 2023-11-20 03:23:02,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=920520.0, ans=0.0 2023-11-20 03:23:02,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=920520.0, ans=0.125 2023-11-20 03:23:03,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=920520.0, ans=0.125 2023-11-20 03:23:05,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=920520.0, ans=0.2 2023-11-20 03:23:08,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=920520.0, ans=0.0 2023-11-20 03:23:23,086 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 138100 2023-11-20 03:23:36,488 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 5850, loss[loss=0.05171, simple_loss=0.05137, pruned_loss=0.01225, audio_tagging_loss=0.01378, over 16472.00 frames. ], tot_loss[loss=0.08164, simple_loss=0.1017, pruned_loss=0.02063, audio_tagging_loss=0.01016, over 3049476.09 frames. ], batch size: 66, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:23:39,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=920720.0, ans=0.125 2023-11-20 03:23:39,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=920720.0, ans=0.0 2023-11-20 03:23:58,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=920786.6666666666, ans=0.1 2023-11-20 03:24:00,276 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:24:04,293 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=12.0 2023-11-20 03:24:05,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=920853.3333333334, ans=0.125 2023-11-20 03:24:27,839 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 138150 2023-11-20 03:24:30,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=920986.6666666666, ans=0.1 2023-11-20 03:24:37,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.38 vs. limit=15.0 2023-11-20 03:24:39,934 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 5900, loss[loss=0.08411, simple_loss=0.1094, pruned_loss=0.02161, audio_tagging_loss=0.007813, over 14290.00 frames. ], tot_loss[loss=0.08195, simple_loss=0.1021, pruned_loss=0.02082, audio_tagging_loss=0.01006, over 3046213.69 frames. ], batch size: 55, lr: 5.77e-03, grad_scale: 16.0 2023-11-20 03:24:40,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=921053.3333333334, ans=0.1 2023-11-20 03:24:44,710 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.22 vs. limit=12.0 2023-11-20 03:24:47,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=921053.3333333334, ans=0.2 2023-11-20 03:24:52,127 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.45 vs. limit=15.0 2023-11-20 03:24:59,812 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.588e+01 8.195e+01 8.943e+01 1.006e+02 1.652e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-20 03:25:03,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=921186.6666666666, ans=0.1 2023-11-20 03:25:03,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=921186.6666666666, ans=0.2 2023-11-20 03:25:17,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=921253.3333333334, ans=0.125 2023-11-20 03:25:31,433 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 138200 2023-11-20 03:25:38,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=921320.0, ans=0.125 2023-11-20 03:25:43,841 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 5950, loss[loss=0.09201, simple_loss=0.1188, pruned_loss=0.02466, audio_tagging_loss=0.007943, over 14666.00 frames. ], tot_loss[loss=0.08166, simple_loss=0.102, pruned_loss=0.02066, audio_tagging_loss=0.01002, over 3042267.79 frames. ], batch size: 53, lr: 5.77e-03, grad_scale: 16.0 2023-11-20 03:25:49,225 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.77 vs. limit=22.5 2023-11-20 03:25:52,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=921386.6666666666, ans=0.125 2023-11-20 03:26:01,509 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.95 vs. limit=22.5 2023-11-20 03:26:03,407 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:26:11,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=921520.0, ans=0.0 2023-11-20 03:26:34,904 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 138250 2023-11-20 03:26:36,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=921653.3333333334, ans=0.125 2023-11-20 03:26:47,498 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 6000, loss[loss=0.06478, simple_loss=0.08475, pruned_loss=0.01375, audio_tagging_loss=0.008651, over 14372.00 frames. ], tot_loss[loss=0.0817, simple_loss=0.102, pruned_loss=0.02072, audio_tagging_loss=0.009964, over 3039911.79 frames. ], batch size: 54, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:26:47,499 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-20 03:27:28,651 INFO [train_asr.py:1294] (3/4) Epoch 12, validation: loss=0.06387, simple_loss=0.05435, pruned_loss=0.006012, audio_tagging_loss=0.03068, over 4681554.00 frames. 2023-11-20 03:27:28,653 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-20 03:27:31,775 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2023-11-20 03:27:34,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=921720.0, ans=0.05 2023-11-20 03:27:34,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=921720.0, ans=0.125 2023-11-20 03:27:35,306 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.06 vs. limit=22.5 2023-11-20 03:27:39,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=921786.6666666666, ans=0.1 2023-11-20 03:27:44,996 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.80 vs. limit=12.0 2023-11-20 03:27:48,134 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.723e+01 8.205e+01 8.900e+01 1.006e+02 1.555e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-20 03:28:02,284 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=12.0 2023-11-20 03:28:08,046 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:28:16,477 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 03:28:20,232 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 138300 2023-11-20 03:28:20,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=921986.6666666666, ans=0.0 2023-11-20 03:28:32,678 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 6050, loss[loss=0.08245, simple_loss=0.1078, pruned_loss=0.01791, audio_tagging_loss=0.01064, over 16573.00 frames. ], tot_loss[loss=0.08125, simple_loss=0.1015, pruned_loss=0.02044, audio_tagging_loss=0.01006, over 3047042.99 frames. ], batch size: 59, lr: 5.77e-03, grad_scale: 16.0 2023-11-20 03:29:12,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=922253.3333333334, ans=0.0 2023-11-20 03:29:17,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=922253.3333333334, ans=0.125 2023-11-20 03:29:17,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=922253.3333333334, ans=0.95 2023-11-20 03:29:24,345 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 138350 2023-11-20 03:29:33,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=922320.0, ans=0.1 2023-11-20 03:29:37,770 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 6100, loss[loss=0.08666, simple_loss=0.1081, pruned_loss=0.02169, audio_tagging_loss=0.01091, over 15804.00 frames. ], tot_loss[loss=0.08147, simple_loss=0.1016, pruned_loss=0.02057, audio_tagging_loss=0.01012, over 3047345.80 frames. ], batch size: 61, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:29:43,686 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.26 vs. limit=22.5 2023-11-20 03:29:45,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=922386.6666666666, ans=0.0 2023-11-20 03:30:00,020 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.566e+01 7.861e+01 8.501e+01 9.321e+01 2.317e+02, threshold=1.700e+02, percent-clipped=1.0 2023-11-20 03:30:30,142 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 138400 2023-11-20 03:30:43,084 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 6150, loss[loss=0.08286, simple_loss=0.1031, pruned_loss=0.0191, audio_tagging_loss=0.01221, over 15441.00 frames. ], tot_loss[loss=0.08076, simple_loss=0.1008, pruned_loss=0.02021, audio_tagging_loss=0.01016, over 3054521.81 frames. ], batch size: 57, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:31:09,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=922853.3333333334, ans=0.0 2023-11-20 03:31:10,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=922853.3333333334, ans=0.05 2023-11-20 03:31:34,749 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 138450 2023-11-20 03:31:47,110 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 6200, loss[loss=0.07635, simple_loss=0.09736, pruned_loss=0.01819, audio_tagging_loss=0.009482, over 14473.00 frames. ], tot_loss[loss=0.08175, simple_loss=0.1021, pruned_loss=0.02057, audio_tagging_loss=0.01011, over 3051719.09 frames. ], batch size: 54, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:32:06,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=923120.0, ans=0.025 2023-11-20 03:32:09,333 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.126e+01 8.412e+01 9.324e+01 1.012e+02 1.364e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-20 03:32:25,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=923253.3333333334, ans=0.0 2023-11-20 03:32:39,060 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 138500 2023-11-20 03:32:49,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=923320.0, ans=0.125 2023-11-20 03:32:51,991 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 6250, loss[loss=0.1105, simple_loss=0.1498, pruned_loss=0.0265, audio_tagging_loss=0.009148, over 15689.00 frames. ], tot_loss[loss=0.08201, simple_loss=0.1021, pruned_loss=0.02065, audio_tagging_loss=0.01031, over 3049481.08 frames. ], batch size: 57, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:32:58,005 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.11 vs. limit=15.0 2023-11-20 03:33:07,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=923453.3333333334, ans=0.125 2023-11-20 03:33:07,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.31 vs. limit=15.0 2023-11-20 03:33:43,692 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 138550 2023-11-20 03:33:47,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2023-11-20 03:33:55,110 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2023-11-20 03:33:55,702 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 6300, loss[loss=0.08977, simple_loss=0.1167, pruned_loss=0.02241, audio_tagging_loss=0.009022, over 16173.00 frames. ], tot_loss[loss=0.08248, simple_loss=0.1028, pruned_loss=0.02079, audio_tagging_loss=0.0103, over 3051228.53 frames. ], batch size: 59, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:34:01,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=923720.0, ans=0.1 2023-11-20 03:34:08,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=923786.6666666666, ans=0.125 2023-11-20 03:34:12,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=923786.6666666666, ans=0.0 2023-11-20 03:34:17,727 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.678e+01 8.557e+01 9.163e+01 1.006e+02 1.411e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-20 03:34:24,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=923853.3333333334, ans=0.0 2023-11-20 03:34:28,524 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:34:36,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=923920.0, ans=0.125 2023-11-20 03:34:39,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.55 vs. limit=6.0 2023-11-20 03:34:44,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=923920.0, ans=0.0 2023-11-20 03:34:44,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.42 vs. limit=22.5 2023-11-20 03:34:48,351 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 138600 2023-11-20 03:34:52,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=923986.6666666666, ans=0.125 2023-11-20 03:34:54,195 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.78 vs. limit=6.0 2023-11-20 03:35:01,709 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 6350, loss[loss=0.06867, simple_loss=0.07834, pruned_loss=0.01814, audio_tagging_loss=0.01136, over 14786.00 frames. ], tot_loss[loss=0.08191, simple_loss=0.102, pruned_loss=0.02056, audio_tagging_loss=0.01038, over 3048739.83 frames. ], batch size: 56, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:35:03,859 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.79 vs. limit=6.0 2023-11-20 03:35:15,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=924120.0, ans=0.125 2023-11-20 03:35:23,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=924120.0, ans=0.125 2023-11-20 03:35:25,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=924120.0, ans=0.125 2023-11-20 03:35:53,793 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 138650 2023-11-20 03:36:04,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=924320.0, ans=0.125 2023-11-20 03:36:06,523 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 6400, loss[loss=0.06438, simple_loss=0.07562, pruned_loss=0.01589, audio_tagging_loss=0.01069, over 15968.00 frames. ], tot_loss[loss=0.08167, simple_loss=0.1015, pruned_loss=0.02046, audio_tagging_loss=0.01048, over 3049513.44 frames. ], batch size: 62, lr: 5.76e-03, grad_scale: 32.0 2023-11-20 03:36:11,187 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:36:27,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=924453.3333333334, ans=0.125 2023-11-20 03:36:27,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=924453.3333333334, ans=0.125 2023-11-20 03:36:28,542 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.492e+01 8.195e+01 8.907e+01 9.891e+01 1.303e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-20 03:36:35,486 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0 2023-11-20 03:36:40,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=924520.0, ans=0.0 2023-11-20 03:36:55,080 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.33 vs. limit=12.0 2023-11-20 03:36:58,270 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 138700 2023-11-20 03:37:09,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=924653.3333333334, ans=15.0 2023-11-20 03:37:11,065 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 6450, loss[loss=0.08384, simple_loss=0.1083, pruned_loss=0.02217, audio_tagging_loss=0.007525, over 14225.00 frames. ], tot_loss[loss=0.08151, simple_loss=0.1009, pruned_loss=0.02049, audio_tagging_loss=0.01055, over 3042410.55 frames. ], batch size: 54, lr: 5.76e-03, grad_scale: 32.0 2023-11-20 03:37:29,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=924786.6666666666, ans=0.0 2023-11-20 03:37:44,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=924853.3333333334, ans=0.125 2023-11-20 03:37:45,266 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.34 vs. limit=15.0 2023-11-20 03:37:56,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=924920.0, ans=0.125 2023-11-20 03:37:57,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=924920.0, ans=0.0 2023-11-20 03:37:58,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=924920.0, ans=0.125 2023-11-20 03:38:02,396 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 138750 2023-11-20 03:38:15,267 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 6500, loss[loss=0.07331, simple_loss=0.08972, pruned_loss=0.01639, audio_tagging_loss=0.01207, over 15631.00 frames. ], tot_loss[loss=0.08176, simple_loss=0.1011, pruned_loss=0.02067, audio_tagging_loss=0.01054, over 3043169.58 frames. ], batch size: 59, lr: 5.76e-03, grad_scale: 32.0 2023-11-20 03:38:15,872 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.69 vs. limit=15.0 2023-11-20 03:38:16,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=925053.3333333334, ans=0.07 2023-11-20 03:38:19,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=925053.3333333334, ans=0.1 2023-11-20 03:38:22,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=925053.3333333334, ans=0.0 2023-11-20 03:38:25,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=925053.3333333334, ans=0.125 2023-11-20 03:38:25,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=925053.3333333334, ans=0.125 2023-11-20 03:38:31,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=925120.0, ans=0.125 2023-11-20 03:38:35,537 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2023-11-20 03:38:37,926 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.023e+01 8.283e+01 9.037e+01 9.701e+01 1.555e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-20 03:38:58,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=925253.3333333334, ans=0.0 2023-11-20 03:39:04,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=925253.3333333334, ans=0.1 2023-11-20 03:39:06,622 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 138800 2023-11-20 03:39:20,288 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 6550, loss[loss=0.08948, simple_loss=0.1186, pruned_loss=0.02273, audio_tagging_loss=0.007451, over 15487.00 frames. ], tot_loss[loss=0.08201, simple_loss=0.1019, pruned_loss=0.02075, audio_tagging_loss=0.01029, over 3047671.86 frames. ], batch size: 57, lr: 5.76e-03, grad_scale: 32.0 2023-11-20 03:39:27,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=925386.6666666666, ans=0.125 2023-11-20 03:39:28,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=925386.6666666666, ans=0.1 2023-11-20 03:39:46,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=925520.0, ans=0.0 2023-11-20 03:39:50,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=925520.0, ans=0.1 2023-11-20 03:40:12,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=12.0 2023-11-20 03:40:12,485 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 138850 2023-11-20 03:40:18,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=925653.3333333334, ans=0.0 2023-11-20 03:40:25,145 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 6600, loss[loss=0.08531, simple_loss=0.1106, pruned_loss=0.02313, audio_tagging_loss=0.006876, over 14450.00 frames. ], tot_loss[loss=0.08232, simple_loss=0.1026, pruned_loss=0.02094, audio_tagging_loss=0.01009, over 3054839.99 frames. ], batch size: 55, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:40:27,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=925720.0, ans=0.125 2023-11-20 03:40:27,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=925720.0, ans=0.0 2023-11-20 03:40:44,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=925786.6666666666, ans=0.0 2023-11-20 03:40:46,612 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.736e+01 8.079e+01 8.710e+01 9.676e+01 1.211e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 03:40:56,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=925853.3333333334, ans=0.5 2023-11-20 03:41:11,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=925920.0, ans=0.0 2023-11-20 03:41:17,193 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 138900 2023-11-20 03:41:29,873 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 6650, loss[loss=0.08531, simple_loss=0.1031, pruned_loss=0.02365, audio_tagging_loss=0.01009, over 15431.00 frames. ], tot_loss[loss=0.08302, simple_loss=0.1037, pruned_loss=0.02125, audio_tagging_loss=0.009942, over 3049155.63 frames. ], batch size: 61, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:41:48,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=926120.0, ans=0.035 2023-11-20 03:42:21,831 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 138950 2023-11-20 03:42:34,595 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 6700, loss[loss=0.09493, simple_loss=0.1277, pruned_loss=0.023, audio_tagging_loss=0.008105, over 15533.00 frames. ], tot_loss[loss=0.08273, simple_loss=0.1034, pruned_loss=0.02112, audio_tagging_loss=0.009933, over 3050593.97 frames. ], batch size: 56, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:42:38,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=926386.6666666666, ans=0.125 2023-11-20 03:42:41,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=926386.6666666666, ans=0.0 2023-11-20 03:42:46,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=926386.6666666666, ans=0.125 2023-11-20 03:42:55,751 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.061e-02 2023-11-20 03:42:56,601 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.827e+01 8.182e+01 8.627e+01 9.432e+01 1.236e+02, threshold=1.725e+02, percent-clipped=0.0 2023-11-20 03:42:58,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=926453.3333333334, ans=0.125 2023-11-20 03:43:08,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=926520.0, ans=0.1 2023-11-20 03:43:18,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=926586.6666666666, ans=0.125 2023-11-20 03:43:24,951 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.16 vs. limit=15.0 2023-11-20 03:43:26,347 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 139000 2023-11-20 03:43:35,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=926653.3333333334, ans=0.125 2023-11-20 03:43:39,256 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 6750, loss[loss=0.08177, simple_loss=0.09754, pruned_loss=0.01797, audio_tagging_loss=0.01503, over 15554.00 frames. ], tot_loss[loss=0.08284, simple_loss=0.1035, pruned_loss=0.02115, audio_tagging_loss=0.009956, over 3052581.53 frames. ], batch size: 59, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:43:47,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=926720.0, ans=0.2 2023-11-20 03:43:53,610 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:43:55,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=926786.6666666666, ans=0.2 2023-11-20 03:44:05,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=926853.3333333334, ans=0.0 2023-11-20 03:44:16,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=926920.0, ans=0.0 2023-11-20 03:44:28,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=926920.0, ans=0.125 2023-11-20 03:44:30,366 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 139050 2023-11-20 03:44:30,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=926986.6666666666, ans=0.125 2023-11-20 03:44:42,797 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 6800, loss[loss=0.06073, simple_loss=0.06394, pruned_loss=0.01562, audio_tagging_loss=0.01314, over 14934.00 frames. ], tot_loss[loss=0.08264, simple_loss=0.1029, pruned_loss=0.02112, audio_tagging_loss=0.01005, over 3046434.56 frames. ], batch size: 56, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:44:56,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=927120.0, ans=10.0 2023-11-20 03:45:04,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=927120.0, ans=0.125 2023-11-20 03:45:06,178 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.226e+01 8.844e+01 9.966e+01 1.208e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-20 03:45:08,025 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.66 vs. limit=15.0 2023-11-20 03:45:13,952 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.65 vs. limit=15.0 2023-11-20 03:45:34,556 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 139100 2023-11-20 03:45:35,155 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.39 vs. limit=22.5 2023-11-20 03:45:39,189 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.66 vs. limit=15.0 2023-11-20 03:45:46,933 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 6850, loss[loss=0.07694, simple_loss=0.103, pruned_loss=0.01365, audio_tagging_loss=0.01179, over 15661.00 frames. ], tot_loss[loss=0.08243, simple_loss=0.1028, pruned_loss=0.02099, audio_tagging_loss=0.01004, over 3042889.58 frames. ], batch size: 57, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:45:49,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=927386.6666666666, ans=0.125 2023-11-20 03:46:27,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=927586.6666666666, ans=0.1 2023-11-20 03:46:38,580 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 139150 2023-11-20 03:46:52,494 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 6900, loss[loss=0.06319, simple_loss=0.07174, pruned_loss=0.01099, audio_tagging_loss=0.01633, over 15031.00 frames. ], tot_loss[loss=0.08228, simple_loss=0.1026, pruned_loss=0.02091, audio_tagging_loss=0.01006, over 3045557.40 frames. ], batch size: 59, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:47:06,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=927786.6666666666, ans=0.1 2023-11-20 03:47:16,178 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.075e+01 8.808e+01 9.662e+01 1.235e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-20 03:47:40,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=927920.0, ans=0.125 2023-11-20 03:47:42,970 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 03:47:44,312 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 139200 2023-11-20 03:47:47,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=927986.6666666666, ans=0.0 2023-11-20 03:47:57,539 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 6950, loss[loss=0.08017, simple_loss=0.09398, pruned_loss=0.02128, audio_tagging_loss=0.0119, over 15128.00 frames. ], tot_loss[loss=0.08153, simple_loss=0.1016, pruned_loss=0.02059, audio_tagging_loss=0.01015, over 3042584.25 frames. ], batch size: 56, lr: 5.75e-03, grad_scale: 16.0 2023-11-20 03:48:00,792 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.94 vs. limit=15.0 2023-11-20 03:48:26,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=928186.6666666666, ans=0.125 2023-11-20 03:48:28,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=928186.6666666666, ans=0.07 2023-11-20 03:48:30,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=928186.6666666666, ans=0.0 2023-11-20 03:48:33,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=928186.6666666666, ans=0.05 2023-11-20 03:48:36,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=928253.3333333334, ans=0.125 2023-11-20 03:48:45,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=928253.3333333334, ans=0.125 2023-11-20 03:48:48,905 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 139250 2023-11-20 03:49:01,007 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 7000, loss[loss=0.08132, simple_loss=0.09553, pruned_loss=0.02273, audio_tagging_loss=0.01083, over 16151.00 frames. ], tot_loss[loss=0.08202, simple_loss=0.1021, pruned_loss=0.02076, audio_tagging_loss=0.01021, over 3051990.34 frames. ], batch size: 63, lr: 5.75e-03, grad_scale: 16.0 2023-11-20 03:49:03,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=928386.6666666666, ans=0.0 2023-11-20 03:49:07,656 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2023-11-20 03:49:14,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=928453.3333333334, ans=0.125 2023-11-20 03:49:25,770 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.837e+01 8.169e+01 8.824e+01 9.776e+01 1.242e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-20 03:49:29,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=928520.0, ans=0.04949747468305833 2023-11-20 03:49:48,627 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.537e-03 2023-11-20 03:49:52,168 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 139300 2023-11-20 03:50:05,521 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 7050, loss[loss=0.07827, simple_loss=0.09223, pruned_loss=0.02307, audio_tagging_loss=0.009085, over 14309.00 frames. ], tot_loss[loss=0.08144, simple_loss=0.1012, pruned_loss=0.02055, audio_tagging_loss=0.0103, over 3050175.45 frames. ], batch size: 53, lr: 5.75e-03, grad_scale: 16.0 2023-11-20 03:50:39,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=928853.3333333334, ans=0.125 2023-11-20 03:50:40,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=928853.3333333334, ans=0.0 2023-11-20 03:50:57,625 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 139350 2023-11-20 03:51:02,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=928986.6666666666, ans=0.0 2023-11-20 03:51:10,695 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 7100, loss[loss=0.07486, simple_loss=0.09666, pruned_loss=0.01802, audio_tagging_loss=0.008516, over 14921.00 frames. ], tot_loss[loss=0.08165, simple_loss=0.1016, pruned_loss=0.02058, audio_tagging_loss=0.01025, over 3052287.13 frames. ], batch size: 58, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:51:23,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=929120.0, ans=0.0 2023-11-20 03:51:26,836 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=15.06 vs. limit=15.0 2023-11-20 03:51:29,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=929120.0, ans=0.09899494936611666 2023-11-20 03:51:34,680 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.602e+01 8.104e+01 8.912e+01 9.574e+01 1.346e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-20 03:51:51,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=929253.3333333334, ans=15.0 2023-11-20 03:51:59,607 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=22.5 2023-11-20 03:52:03,494 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 139400 2023-11-20 03:52:15,903 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 7150, loss[loss=0.0867, simple_loss=0.1067, pruned_loss=0.02246, audio_tagging_loss=0.01088, over 14906.00 frames. ], tot_loss[loss=0.08191, simple_loss=0.1018, pruned_loss=0.02065, audio_tagging_loss=0.01034, over 3049095.98 frames. ], batch size: 59, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:52:26,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=929386.6666666666, ans=0.1 2023-11-20 03:52:34,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=929453.3333333334, ans=0.125 2023-11-20 03:52:36,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=929453.3333333334, ans=0.125 2023-11-20 03:52:44,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=929520.0, ans=0.125 2023-11-20 03:53:05,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.89 vs. limit=15.0 2023-11-20 03:53:07,759 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 139450 2023-11-20 03:53:07,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=929653.3333333334, ans=0.0 2023-11-20 03:53:20,611 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 7200, loss[loss=0.08856, simple_loss=0.1147, pruned_loss=0.02123, audio_tagging_loss=0.00996, over 14462.00 frames. ], tot_loss[loss=0.08221, simple_loss=0.1023, pruned_loss=0.0207, audio_tagging_loss=0.01034, over 3045936.76 frames. ], batch size: 54, lr: 5.74e-03, grad_scale: 32.0 2023-11-20 03:53:22,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=929720.0, ans=0.125 2023-11-20 03:53:23,028 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2023-11-20 03:53:45,895 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.207e+01 8.353e+01 8.991e+01 9.790e+01 1.410e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-20 03:53:55,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=929853.3333333334, ans=0.125 2023-11-20 03:54:06,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=929920.0, ans=0.125 2023-11-20 03:54:13,124 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 139500 2023-11-20 03:54:14,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=929986.6666666666, ans=0.125 2023-11-20 03:54:25,360 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 7250, loss[loss=0.08221, simple_loss=0.1067, pruned_loss=0.02159, audio_tagging_loss=0.007298, over 14800.00 frames. ], tot_loss[loss=0.08196, simple_loss=0.1019, pruned_loss=0.02063, audio_tagging_loss=0.01036, over 3046398.82 frames. ], batch size: 57, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:54:42,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=930120.0, ans=0.04949747468305833 2023-11-20 03:55:15,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=930253.3333333334, ans=0.0 2023-11-20 03:55:17,501 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 139550 2023-11-20 03:55:17,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=930320.0, ans=0.95 2023-11-20 03:55:30,931 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 7300, loss[loss=0.06157, simple_loss=0.07027, pruned_loss=0.0135, audio_tagging_loss=0.01294, over 16092.00 frames. ], tot_loss[loss=0.08273, simple_loss=0.103, pruned_loss=0.02098, audio_tagging_loss=0.01025, over 3049096.67 frames. ], batch size: 61, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:55:45,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=930453.3333333334, ans=0.0 2023-11-20 03:55:56,596 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.831e+01 8.147e+01 8.750e+01 9.433e+01 1.159e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 03:55:58,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=930520.0, ans=0.1 2023-11-20 03:56:10,287 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.93 vs. limit=12.0 2023-11-20 03:56:11,138 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:56:18,901 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2023-11-20 03:56:22,020 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 139600 2023-11-20 03:56:24,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=930653.3333333334, ans=0.125 2023-11-20 03:56:35,320 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 7350, loss[loss=0.06711, simple_loss=0.08472, pruned_loss=0.01574, audio_tagging_loss=0.009013, over 15183.00 frames. ], tot_loss[loss=0.08219, simple_loss=0.1022, pruned_loss=0.02095, audio_tagging_loss=0.01013, over 3047176.13 frames. ], batch size: 60, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:56:35,901 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.34 vs. limit=22.5 2023-11-20 03:57:19,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=930920.0, ans=0.0 2023-11-20 03:57:19,432 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.63 vs. limit=22.5 2023-11-20 03:57:27,008 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 139650 2023-11-20 03:57:33,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=930986.6666666666, ans=0.0 2023-11-20 03:57:39,581 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 7400, loss[loss=0.08446, simple_loss=0.1098, pruned_loss=0.02112, audio_tagging_loss=0.008435, over 14593.00 frames. ], tot_loss[loss=0.08102, simple_loss=0.1007, pruned_loss=0.0206, audio_tagging_loss=0.01007, over 3043714.68 frames. ], batch size: 56, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:57:49,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=931053.3333333334, ans=0.125 2023-11-20 03:57:50,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=931053.3333333334, ans=0.125 2023-11-20 03:57:54,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=931120.0, ans=0.125 2023-11-20 03:58:00,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=931120.0, ans=0.2 2023-11-20 03:58:04,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=931186.6666666666, ans=0.0 2023-11-20 03:58:05,076 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.423e+01 7.809e+01 8.517e+01 9.487e+01 1.228e+02, threshold=1.703e+02, percent-clipped=0.0 2023-11-20 03:58:10,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=931186.6666666666, ans=0.125 2023-11-20 03:58:12,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=931186.6666666666, ans=0.2 2023-11-20 03:58:30,965 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 139700 2023-11-20 03:58:44,226 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 7450, loss[loss=0.07457, simple_loss=0.08735, pruned_loss=0.01606, audio_tagging_loss=0.01484, over 15153.00 frames. ], tot_loss[loss=0.08123, simple_loss=0.1012, pruned_loss=0.02059, audio_tagging_loss=0.01005, over 3040610.85 frames. ], batch size: 56, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:58:53,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=931386.6666666666, ans=0.125 2023-11-20 03:59:24,562 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.90 vs. limit=12.0 2023-11-20 03:59:27,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=931586.6666666666, ans=0.2 2023-11-20 03:59:35,535 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 139750 2023-11-20 03:59:43,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=931653.3333333334, ans=0.125 2023-11-20 03:59:47,764 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 7500, loss[loss=0.08806, simple_loss=0.1124, pruned_loss=0.02462, audio_tagging_loss=0.007247, over 14494.00 frames. ], tot_loss[loss=0.08184, simple_loss=0.1021, pruned_loss=0.02081, audio_tagging_loss=0.009983, over 3045161.68 frames. ], batch size: 56, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 04:00:01,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=931786.6666666666, ans=0.125 2023-11-20 04:00:06,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=931786.6666666666, ans=0.1 2023-11-20 04:00:10,610 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.54 vs. limit=10.0 2023-11-20 04:00:14,142 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.250e+01 8.176e+01 8.772e+01 9.671e+01 2.176e+02, threshold=1.754e+02, percent-clipped=1.0 2023-11-20 04:00:18,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=931853.3333333334, ans=0.0 2023-11-20 04:00:23,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=931853.3333333334, ans=0.125 2023-11-20 04:00:28,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=931920.0, ans=0.125 2023-11-20 04:00:35,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=931920.0, ans=0.125 2023-11-20 04:00:39,807 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 139800 2023-11-20 04:00:40,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=931986.6666666666, ans=0.05 2023-11-20 04:00:42,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=931986.6666666666, ans=0.2 2023-11-20 04:00:42,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=931986.6666666666, ans=0.0 2023-11-20 04:00:44,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=931986.6666666666, ans=0.0 2023-11-20 04:00:52,961 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 7550, loss[loss=0.09619, simple_loss=0.1231, pruned_loss=0.02623, audio_tagging_loss=0.008396, over 16316.00 frames. ], tot_loss[loss=0.08225, simple_loss=0.1024, pruned_loss=0.02111, audio_tagging_loss=0.009919, over 3047581.11 frames. ], batch size: 58, lr: 5.73e-03, grad_scale: 16.0 2023-11-20 04:00:53,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=932053.3333333334, ans=0.2 2023-11-20 04:00:53,811 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.43 vs. limit=12.0 2023-11-20 04:01:02,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=932053.3333333334, ans=0.0 2023-11-20 04:01:44,692 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 139850 2023-11-20 04:01:45,259 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.00 vs. limit=15.0 2023-11-20 04:01:57,653 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 7600, loss[loss=0.0553, simple_loss=0.05949, pruned_loss=0.01391, audio_tagging_loss=0.01164, over 13955.00 frames. ], tot_loss[loss=0.08177, simple_loss=0.1019, pruned_loss=0.02096, audio_tagging_loss=0.009844, over 3049701.67 frames. ], batch size: 57, lr: 5.73e-03, grad_scale: 32.0 2023-11-20 04:02:23,913 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.765e+01 8.108e+01 8.753e+01 9.539e+01 1.294e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-20 04:02:30,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=932520.0, ans=10.0 2023-11-20 04:02:49,273 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 139900 2023-11-20 04:02:54,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=932653.3333333334, ans=0.2 2023-11-20 04:03:02,139 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 7650, loss[loss=0.09089, simple_loss=0.1229, pruned_loss=0.02282, audio_tagging_loss=0.006645, over 16772.00 frames. ], tot_loss[loss=0.08133, simple_loss=0.1015, pruned_loss=0.02069, audio_tagging_loss=0.009915, over 3051066.06 frames. ], batch size: 62, lr: 5.73e-03, grad_scale: 32.0 2023-11-20 04:03:21,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=932786.6666666666, ans=0.1 2023-11-20 04:03:24,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=932786.6666666666, ans=0.0 2023-11-20 04:03:52,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=932986.6666666666, ans=0.0 2023-11-20 04:03:53,585 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 139950 2023-11-20 04:03:55,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=932986.6666666666, ans=0.0 2023-11-20 04:04:06,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=933053.3333333334, ans=0.125 2023-11-20 04:04:07,043 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 7700, loss[loss=0.09476, simple_loss=0.1209, pruned_loss=0.02634, audio_tagging_loss=0.00798, over 13958.00 frames. ], tot_loss[loss=0.08078, simple_loss=0.1004, pruned_loss=0.02052, audio_tagging_loss=0.01004, over 3043719.71 frames. ], batch size: 52, lr: 5.73e-03, grad_scale: 32.0 2023-11-20 04:04:17,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=933053.3333333334, ans=0.125 2023-11-20 04:04:17,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=933053.3333333334, ans=0.2 2023-11-20 04:04:31,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=933186.6666666666, ans=0.125 2023-11-20 04:04:32,001 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.410e+01 8.189e+01 8.877e+01 9.506e+01 1.213e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 04:04:46,159 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2023-11-20 04:04:58,462 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 140000 2023-11-20 04:05:15,108 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 7750, loss[loss=0.1198, simple_loss=0.1543, pruned_loss=0.03432, audio_tagging_loss=0.008336, over 15590.00 frames. ], tot_loss[loss=0.08102, simple_loss=0.1006, pruned_loss=0.02061, audio_tagging_loss=0.01012, over 3042504.24 frames. ], batch size: 56, lr: 5.73e-03, grad_scale: 32.0 2023-11-20 04:05:17,054 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2023-11-20 04:05:26,810 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.04 vs. limit=15.0 2023-11-20 04:05:43,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=933520.0, ans=0.0 2023-11-20 04:05:51,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=933520.0, ans=0.0 2023-11-20 04:05:55,427 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.98 vs. limit=22.5 2023-11-20 04:06:06,669 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 140050 2023-11-20 04:06:10,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=933653.3333333334, ans=0.125 2023-11-20 04:06:19,737 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 7800, loss[loss=0.06317, simple_loss=0.07697, pruned_loss=0.01481, audio_tagging_loss=0.009875, over 15093.00 frames. ], tot_loss[loss=0.08136, simple_loss=0.1012, pruned_loss=0.02072, audio_tagging_loss=0.01005, over 3040291.76 frames. ], batch size: 57, lr: 5.73e-03, grad_scale: 16.0 2023-11-20 04:06:34,054 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.96 vs. limit=15.0 2023-11-20 04:06:46,648 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.169e+01 8.821e+01 9.790e+01 1.228e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-20 04:07:11,507 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 140100 2023-11-20 04:07:18,911 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.95 vs. limit=15.0 2023-11-20 04:07:24,822 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 7850, loss[loss=0.08029, simple_loss=0.1042, pruned_loss=0.01931, audio_tagging_loss=0.008884, over 14531.00 frames. ], tot_loss[loss=0.08077, simple_loss=0.1003, pruned_loss=0.02039, audio_tagging_loss=0.01023, over 3043392.30 frames. ], batch size: 56, lr: 5.73e-03, grad_scale: 16.0 2023-11-20 04:07:37,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=934120.0, ans=0.125 2023-11-20 04:07:37,646 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:07:40,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=934120.0, ans=0.0 2023-11-20 04:07:55,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=934186.6666666666, ans=0.125 2023-11-20 04:07:57,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=934186.6666666666, ans=0.125 2023-11-20 04:07:59,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=934186.6666666666, ans=0.2 2023-11-20 04:08:02,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=934253.3333333334, ans=0.125 2023-11-20 04:08:16,407 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 140150 2023-11-20 04:08:26,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=934320.0, ans=0.1 2023-11-20 04:08:28,683 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 7900, loss[loss=0.08829, simple_loss=0.1047, pruned_loss=0.02459, audio_tagging_loss=0.01135, over 14919.00 frames. ], tot_loss[loss=0.08259, simple_loss=0.1026, pruned_loss=0.02105, audio_tagging_loss=0.01025, over 3044811.21 frames. ], batch size: 56, lr: 5.73e-03, grad_scale: 16.0 2023-11-20 04:08:32,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=934386.6666666666, ans=0.0 2023-11-20 04:08:56,547 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.338e+01 8.229e+01 9.116e+01 9.916e+01 1.318e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-20 04:08:59,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=934520.0, ans=0.125 2023-11-20 04:09:11,367 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=15.0 2023-11-20 04:09:15,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=934586.6666666666, ans=0.0 2023-11-20 04:09:19,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=934586.6666666666, ans=0.125 2023-11-20 04:09:21,245 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 140200 2023-11-20 04:09:26,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=934653.3333333334, ans=0.1 2023-11-20 04:09:33,523 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 7950, loss[loss=0.08704, simple_loss=0.1039, pruned_loss=0.02504, audio_tagging_loss=0.01003, over 16329.00 frames. ], tot_loss[loss=0.08192, simple_loss=0.1015, pruned_loss=0.0208, audio_tagging_loss=0.01034, over 3047141.02 frames. ], batch size: 61, lr: 5.73e-03, grad_scale: 16.0 2023-11-20 04:09:50,709 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:10:24,395 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 140250 2023-11-20 04:10:38,710 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 8000, loss[loss=0.08839, simple_loss=0.1098, pruned_loss=0.02085, audio_tagging_loss=0.01261, over 15092.00 frames. ], tot_loss[loss=0.08171, simple_loss=0.1012, pruned_loss=0.02068, audio_tagging_loss=0.01042, over 3052201.36 frames. ], batch size: 57, lr: 5.73e-03, grad_scale: 32.0 2023-11-20 04:11:05,196 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 8.205e+01 8.859e+01 1.003e+02 1.525e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-20 04:11:28,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2023-11-20 04:11:30,820 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 140300 2023-11-20 04:11:34,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=935320.0, ans=0.125 2023-11-20 04:11:42,829 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 8050, loss[loss=0.08807, simple_loss=0.1078, pruned_loss=0.02509, audio_tagging_loss=0.009081, over 15742.00 frames. ], tot_loss[loss=0.0819, simple_loss=0.1013, pruned_loss=0.02079, audio_tagging_loss=0.01045, over 3049656.57 frames. ], batch size: 57, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:11:45,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=935386.6666666666, ans=0.125 2023-11-20 04:11:46,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=935386.6666666666, ans=0.0 2023-11-20 04:11:47,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=935386.6666666666, ans=0.125 2023-11-20 04:12:33,883 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 140350 2023-11-20 04:12:46,659 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 8100, loss[loss=0.07773, simple_loss=0.09424, pruned_loss=0.01937, audio_tagging_loss=0.01125, over 16034.00 frames. ], tot_loss[loss=0.08241, simple_loss=0.1019, pruned_loss=0.0211, audio_tagging_loss=0.01038, over 3049622.17 frames. ], batch size: 60, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:12:48,484 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.19 vs. limit=15.0 2023-11-20 04:12:55,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=935720.0, ans=0.125 2023-11-20 04:13:05,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.02 vs. limit=12.0 2023-11-20 04:13:14,180 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.312e+01 8.397e+01 8.892e+01 9.736e+01 1.286e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-20 04:13:14,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=935853.3333333334, ans=0.0 2023-11-20 04:13:22,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=935853.3333333334, ans=0.0 2023-11-20 04:13:38,145 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 140400 2023-11-20 04:13:51,052 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 8150, loss[loss=0.08809, simple_loss=0.1061, pruned_loss=0.02535, audio_tagging_loss=0.009717, over 15397.00 frames. ], tot_loss[loss=0.08272, simple_loss=0.1027, pruned_loss=0.02116, audio_tagging_loss=0.01019, over 3047142.85 frames. ], batch size: 57, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:13:57,840 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.69 vs. limit=6.0 2023-11-20 04:14:08,097 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.21 vs. limit=10.0 2023-11-20 04:14:11,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=936120.0, ans=0.1 2023-11-20 04:14:43,644 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 140450 2023-11-20 04:14:56,561 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 8200, loss[loss=0.06407, simple_loss=0.08257, pruned_loss=0.01375, audio_tagging_loss=0.009038, over 15273.00 frames. ], tot_loss[loss=0.08309, simple_loss=0.1036, pruned_loss=0.02121, audio_tagging_loss=0.01009, over 3049773.28 frames. ], batch size: 57, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:14:57,852 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:14:58,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=936386.6666666666, ans=0.0 2023-11-20 04:15:03,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=936386.6666666666, ans=0.1 2023-11-20 04:15:20,113 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.28 vs. limit=12.0 2023-11-20 04:15:23,157 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.922e+01 8.334e+01 8.915e+01 9.605e+01 1.213e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-20 04:15:36,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=936586.6666666666, ans=0.2 2023-11-20 04:15:40,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=936586.6666666666, ans=0.0 2023-11-20 04:15:48,572 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 140500 2023-11-20 04:15:48,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=936653.3333333334, ans=0.07 2023-11-20 04:15:49,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=936653.3333333334, ans=0.2 2023-11-20 04:15:54,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=936653.3333333334, ans=0.1 2023-11-20 04:16:01,667 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 8250, loss[loss=0.07339, simple_loss=0.08745, pruned_loss=0.01826, audio_tagging_loss=0.0114, over 15124.00 frames. ], tot_loss[loss=0.0821, simple_loss=0.1022, pruned_loss=0.02087, audio_tagging_loss=0.01013, over 3047642.25 frames. ], batch size: 55, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:16:04,712 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.52 vs. limit=15.0 2023-11-20 04:16:11,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=936720.0, ans=0.07 2023-11-20 04:16:20,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=936786.6666666666, ans=0.0 2023-11-20 04:16:52,906 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 140550 2023-11-20 04:16:55,803 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.93 vs. limit=22.5 2023-11-20 04:17:05,471 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 8300, loss[loss=0.04482, simple_loss=0.04425, pruned_loss=0.008058, audio_tagging_loss=0.01463, over 15197.00 frames. ], tot_loss[loss=0.08223, simple_loss=0.1023, pruned_loss=0.02091, audio_tagging_loss=0.01016, over 3051929.34 frames. ], batch size: 61, lr: 5.72e-03, grad_scale: 16.0 2023-11-20 04:17:17,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.71 vs. limit=15.0 2023-11-20 04:17:33,792 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.558e+01 8.113e+01 8.924e+01 9.899e+01 1.160e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-20 04:17:36,979 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.53 vs. limit=10.0 2023-11-20 04:17:45,184 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:17:53,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=937253.3333333334, ans=0.2 2023-11-20 04:17:56,388 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 140600 2023-11-20 04:18:09,916 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 8350, loss[loss=0.07486, simple_loss=0.09456, pruned_loss=0.01853, audio_tagging_loss=0.009047, over 14588.00 frames. ], tot_loss[loss=0.08175, simple_loss=0.1018, pruned_loss=0.02065, audio_tagging_loss=0.01019, over 3046607.41 frames. ], batch size: 55, lr: 5.72e-03, grad_scale: 16.0 2023-11-20 04:18:11,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=937386.6666666666, ans=0.0 2023-11-20 04:18:11,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=937386.6666666666, ans=0.0 2023-11-20 04:18:11,884 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.12 vs. limit=22.5 2023-11-20 04:18:12,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=937386.6666666666, ans=0.2 2023-11-20 04:18:29,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=937453.3333333334, ans=0.1 2023-11-20 04:18:29,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=937453.3333333334, ans=0.125 2023-11-20 04:18:30,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=937453.3333333334, ans=0.125 2023-11-20 04:18:42,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=937520.0, ans=0.125 2023-11-20 04:18:56,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=937586.6666666666, ans=0.07 2023-11-20 04:18:58,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=937586.6666666666, ans=0.07 2023-11-20 04:19:02,272 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 140650 2023-11-20 04:19:02,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2023-11-20 04:19:05,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=937653.3333333334, ans=0.125 2023-11-20 04:19:15,090 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 8400, loss[loss=0.07449, simple_loss=0.08822, pruned_loss=0.0184, audio_tagging_loss=0.01198, over 15479.00 frames. ], tot_loss[loss=0.08163, simple_loss=0.1015, pruned_loss=0.02067, audio_tagging_loss=0.01022, over 3044205.31 frames. ], batch size: 60, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:19:17,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=937720.0, ans=0.0 2023-11-20 04:19:24,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2023-11-20 04:19:32,764 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.91 vs. limit=10.0 2023-11-20 04:19:43,612 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.796e+01 7.950e+01 8.532e+01 9.547e+01 1.131e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-20 04:20:00,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=937920.0, ans=0.125 2023-11-20 04:20:07,359 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 140700 2023-11-20 04:20:10,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=937986.6666666666, ans=0.125 2023-11-20 04:20:19,662 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 8450, loss[loss=0.08783, simple_loss=0.1091, pruned_loss=0.02585, audio_tagging_loss=0.007433, over 14956.00 frames. ], tot_loss[loss=0.08203, simple_loss=0.1021, pruned_loss=0.02085, audio_tagging_loss=0.01014, over 3040207.04 frames. ], batch size: 56, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:20:45,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=938186.6666666666, ans=0.0 2023-11-20 04:20:50,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=938186.6666666666, ans=0.125 2023-11-20 04:21:12,269 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 140750 2023-11-20 04:21:21,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=938320.0, ans=0.2 2023-11-20 04:21:22,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=938320.0, ans=0.125 2023-11-20 04:21:24,930 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 8500, loss[loss=0.07017, simple_loss=0.09446, pruned_loss=0.01354, audio_tagging_loss=0.009396, over 14821.00 frames. ], tot_loss[loss=0.08197, simple_loss=0.1023, pruned_loss=0.02071, audio_tagging_loss=0.01011, over 3048663.28 frames. ], batch size: 56, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:21:32,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=938386.6666666666, ans=0.1 2023-11-20 04:21:53,276 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.458e+01 8.139e+01 8.950e+01 9.720e+01 1.235e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-20 04:22:14,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=938586.6666666666, ans=0.125 2023-11-20 04:22:14,780 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2023-11-20 04:22:15,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=938653.3333333334, ans=0.125 2023-11-20 04:22:16,779 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 140800 2023-11-20 04:22:22,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=938653.3333333334, ans=0.125 2023-11-20 04:22:30,066 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 8550, loss[loss=0.07964, simple_loss=0.09278, pruned_loss=0.01886, audio_tagging_loss=0.01439, over 14841.00 frames. ], tot_loss[loss=0.08223, simple_loss=0.1025, pruned_loss=0.0207, audio_tagging_loss=0.01028, over 3050979.85 frames. ], batch size: 56, lr: 5.71e-03, grad_scale: 32.0 2023-11-20 04:22:33,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=938720.0, ans=0.125 2023-11-20 04:22:44,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=938786.6666666666, ans=0.2 2023-11-20 04:22:54,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=938853.3333333334, ans=0.125 2023-11-20 04:23:06,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=938853.3333333334, ans=0.125 2023-11-20 04:23:09,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=938920.0, ans=0.125 2023-11-20 04:23:21,919 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 140850 2023-11-20 04:23:24,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=938986.6666666666, ans=0.125 2023-11-20 04:23:31,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=938986.6666666666, ans=0.0 2023-11-20 04:23:34,038 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 8600, loss[loss=0.08839, simple_loss=0.1141, pruned_loss=0.02246, audio_tagging_loss=0.008861, over 15376.00 frames. ], tot_loss[loss=0.08232, simple_loss=0.1025, pruned_loss=0.02083, audio_tagging_loss=0.01026, over 3049337.78 frames. ], batch size: 55, lr: 5.71e-03, grad_scale: 16.0 2023-11-20 04:23:43,878 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.88 vs. limit=22.5 2023-11-20 04:23:51,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=939120.0, ans=0.125 2023-11-20 04:24:04,333 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.979e+01 8.151e+01 8.812e+01 9.579e+01 1.857e+02, threshold=1.762e+02, percent-clipped=1.0 2023-11-20 04:24:19,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=939253.3333333334, ans=15.0 2023-11-20 04:24:24,294 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.56 vs. limit=15.0 2023-11-20 04:24:26,120 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 140900 2023-11-20 04:24:39,284 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 8650, loss[loss=0.1009, simple_loss=0.124, pruned_loss=0.02643, audio_tagging_loss=0.01247, over 14932.00 frames. ], tot_loss[loss=0.08311, simple_loss=0.1034, pruned_loss=0.02112, audio_tagging_loss=0.0103, over 3050559.08 frames. ], batch size: 60, lr: 5.71e-03, grad_scale: 16.0 2023-11-20 04:24:46,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=939386.6666666666, ans=0.0 2023-11-20 04:25:30,762 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 140950 2023-11-20 04:25:43,345 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 8700, loss[loss=0.0973, simple_loss=0.1321, pruned_loss=0.02489, audio_tagging_loss=0.006341, over 15426.00 frames. ], tot_loss[loss=0.08275, simple_loss=0.103, pruned_loss=0.02093, audio_tagging_loss=0.01033, over 3055073.92 frames. ], batch size: 56, lr: 5.71e-03, grad_scale: 16.0 2023-11-20 04:26:04,257 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.13 vs. limit=22.5 2023-11-20 04:26:13,592 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.908e+01 8.317e+01 9.149e+01 9.990e+01 1.361e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-20 04:26:25,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=939920.0, ans=0.1 2023-11-20 04:26:28,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=939920.0, ans=0.125 2023-11-20 04:26:35,171 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 141000 2023-11-20 04:26:39,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=939986.6666666666, ans=0.0 2023-11-20 04:26:41,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=939986.6666666666, ans=0.025 2023-11-20 04:26:48,338 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 8750, loss[loss=0.0928, simple_loss=0.1184, pruned_loss=0.0268, audio_tagging_loss=0.006812, over 14657.00 frames. ], tot_loss[loss=0.08285, simple_loss=0.103, pruned_loss=0.02106, audio_tagging_loss=0.01031, over 3048352.81 frames. ], batch size: 57, lr: 5.71e-03, grad_scale: 16.0 2023-11-20 04:26:59,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=940053.3333333334, ans=0.125 2023-11-20 04:27:33,353 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2023-11-20 04:27:35,873 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.41 vs. limit=15.0 2023-11-20 04:27:36,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=940253.3333333334, ans=0.015 2023-11-20 04:27:36,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=940253.3333333334, ans=0.0 2023-11-20 04:27:40,240 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 141050 2023-11-20 04:27:53,776 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 8800, loss[loss=0.08887, simple_loss=0.1162, pruned_loss=0.02028, audio_tagging_loss=0.01047, over 15438.00 frames. ], tot_loss[loss=0.08334, simple_loss=0.1033, pruned_loss=0.02128, audio_tagging_loss=0.0104, over 3054117.81 frames. ], batch size: 56, lr: 5.71e-03, grad_scale: 32.0 2023-11-20 04:27:59,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=940386.6666666666, ans=0.0 2023-11-20 04:28:22,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=940520.0, ans=0.2 2023-11-20 04:28:24,227 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.494e+01 8.276e+01 8.994e+01 9.890e+01 1.240e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-20 04:28:44,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=940653.3333333334, ans=0.125 2023-11-20 04:28:44,988 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 141100 2023-11-20 04:28:58,012 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 8850, loss[loss=0.09115, simple_loss=0.1175, pruned_loss=0.02411, audio_tagging_loss=0.008301, over 15180.00 frames. ], tot_loss[loss=0.08268, simple_loss=0.1025, pruned_loss=0.02101, audio_tagging_loss=0.0104, over 3054256.66 frames. ], batch size: 55, lr: 5.71e-03, grad_scale: 16.0 2023-11-20 04:28:59,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=940720.0, ans=0.125 2023-11-20 04:29:02,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=940720.0, ans=0.0 2023-11-20 04:29:03,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=940720.0, ans=0.125 2023-11-20 04:29:10,824 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:29:11,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=940786.6666666666, ans=0.125 2023-11-20 04:29:15,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=940786.6666666666, ans=10.0 2023-11-20 04:29:20,678 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.070e-02 2023-11-20 04:29:37,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=940920.0, ans=0.125 2023-11-20 04:29:37,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=940920.0, ans=0.0 2023-11-20 04:29:49,280 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 141150 2023-11-20 04:29:54,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=940986.6666666666, ans=0.125 2023-11-20 04:30:01,930 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 8900, loss[loss=0.09085, simple_loss=0.1245, pruned_loss=0.02141, audio_tagging_loss=0.007204, over 16146.00 frames. ], tot_loss[loss=0.08235, simple_loss=0.1024, pruned_loss=0.02091, audio_tagging_loss=0.01024, over 3052556.86 frames. ], batch size: 59, lr: 5.71e-03, grad_scale: 8.0 2023-11-20 04:30:07,380 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.96 vs. limit=22.5 2023-11-20 04:30:21,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=941120.0, ans=0.0 2023-11-20 04:30:27,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=941186.6666666666, ans=0.1 2023-11-20 04:30:34,471 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.851e+01 8.057e+01 8.835e+01 9.972e+01 1.678e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-20 04:30:34,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=941186.6666666666, ans=0.1 2023-11-20 04:30:37,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=941186.6666666666, ans=22.5 2023-11-20 04:30:51,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=941253.3333333334, ans=0.0 2023-11-20 04:30:53,995 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 141200 2023-11-20 04:31:07,523 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 8950, loss[loss=0.09209, simple_loss=0.1245, pruned_loss=0.02155, audio_tagging_loss=0.008299, over 16011.00 frames. ], tot_loss[loss=0.08201, simple_loss=0.1024, pruned_loss=0.0208, audio_tagging_loss=0.01002, over 3050443.35 frames. ], batch size: 58, lr: 5.71e-03, grad_scale: 8.0 2023-11-20 04:31:25,854 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.97 vs. limit=15.0 2023-11-20 04:31:33,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=941520.0, ans=0.1 2023-11-20 04:31:50,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=941586.6666666666, ans=0.2 2023-11-20 04:31:57,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=941653.3333333334, ans=0.125 2023-11-20 04:31:58,585 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 141250 2023-11-20 04:32:05,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=941653.3333333334, ans=0.0 2023-11-20 04:32:10,783 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 9000, loss[loss=0.1033, simple_loss=0.1244, pruned_loss=0.02943, audio_tagging_loss=0.01169, over 15287.00 frames. ], tot_loss[loss=0.08205, simple_loss=0.1025, pruned_loss=0.02086, audio_tagging_loss=0.009963, over 3050264.91 frames. ], batch size: 57, lr: 5.71e-03, grad_scale: 8.0 2023-11-20 04:32:10,787 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-20 04:32:31,919 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.4801, 2.5564, 3.7963, 2.3995], device='cuda:3') 2023-11-20 04:32:53,262 INFO [train_asr.py:1294] (3/4) Epoch 12, validation: loss=0.06397, simple_loss=0.05412, pruned_loss=0.005869, audio_tagging_loss=0.03104, over 4681554.00 frames. 2023-11-20 04:32:53,263 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-20 04:33:13,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=941786.6666666666, ans=0.09899494936611666 2023-11-20 04:33:25,860 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.562e+01 8.268e+01 8.688e+01 9.407e+01 1.162e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-20 04:33:38,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=941920.0, ans=0.0 2023-11-20 04:33:45,379 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 141300 2023-11-20 04:33:48,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=941986.6666666666, ans=0.2 2023-11-20 04:33:58,684 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 9050, loss[loss=0.09211, simple_loss=0.1248, pruned_loss=0.02312, audio_tagging_loss=0.006572, over 14663.00 frames. ], tot_loss[loss=0.08188, simple_loss=0.1023, pruned_loss=0.02081, audio_tagging_loss=0.009897, over 3053626.44 frames. ], batch size: 52, lr: 5.70e-03, grad_scale: 8.0 2023-11-20 04:33:59,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=942053.3333333334, ans=0.125 2023-11-20 04:34:01,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=942053.3333333334, ans=0.125 2023-11-20 04:34:25,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=942186.6666666666, ans=0.0 2023-11-20 04:34:25,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=942186.6666666666, ans=0.025 2023-11-20 04:34:28,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=942186.6666666666, ans=0.1 2023-11-20 04:34:29,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=942186.6666666666, ans=0.125 2023-11-20 04:34:33,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=942186.6666666666, ans=0.125 2023-11-20 04:34:33,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=942186.6666666666, ans=0.0 2023-11-20 04:34:50,685 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 141350 2023-11-20 04:35:03,404 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 9100, loss[loss=0.08718, simple_loss=0.1101, pruned_loss=0.0229, audio_tagging_loss=0.009238, over 15022.00 frames. ], tot_loss[loss=0.0813, simple_loss=0.1014, pruned_loss=0.02066, audio_tagging_loss=0.009937, over 3049044.57 frames. ], batch size: 56, lr: 5.70e-03, grad_scale: 8.0 2023-11-20 04:35:06,568 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.98 vs. limit=15.0 2023-11-20 04:35:07,217 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:35:09,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=942386.6666666666, ans=0.125 2023-11-20 04:35:22,355 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.21 vs. limit=15.0 2023-11-20 04:35:28,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=942520.0, ans=0.125 2023-11-20 04:35:36,016 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.481e+01 8.086e+01 8.915e+01 9.526e+01 1.275e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-20 04:35:41,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=942586.6666666666, ans=0.125 2023-11-20 04:35:49,495 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:35:55,813 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 141400 2023-11-20 04:35:55,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=942653.3333333334, ans=0.07 2023-11-20 04:36:01,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=942653.3333333334, ans=0.125 2023-11-20 04:36:08,570 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 9150, loss[loss=0.09162, simple_loss=0.1201, pruned_loss=0.02602, audio_tagging_loss=0.005546, over 15679.00 frames. ], tot_loss[loss=0.08194, simple_loss=0.1021, pruned_loss=0.02098, audio_tagging_loss=0.009919, over 3048645.64 frames. ], batch size: 56, lr: 5.70e-03, grad_scale: 8.0 2023-11-20 04:36:28,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=942786.6666666666, ans=0.1 2023-11-20 04:36:32,230 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2023-11-20 04:36:35,845 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.32 vs. limit=15.0 2023-11-20 04:36:57,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=942920.0, ans=0.125 2023-11-20 04:37:00,701 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 141450 2023-11-20 04:37:07,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=942986.6666666666, ans=0.125 2023-11-20 04:37:10,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=942986.6666666666, ans=0.1 2023-11-20 04:37:14,204 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 9200, loss[loss=0.07788, simple_loss=0.09518, pruned_loss=0.0202, audio_tagging_loss=0.01009, over 14753.00 frames. ], tot_loss[loss=0.08158, simple_loss=0.1018, pruned_loss=0.02081, audio_tagging_loss=0.009898, over 3055632.81 frames. ], batch size: 55, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:37:18,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=943053.3333333334, ans=0.0 2023-11-20 04:37:45,157 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=15.0 2023-11-20 04:37:45,716 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.820e+01 8.352e+01 9.147e+01 9.950e+01 1.226e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-20 04:37:47,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=943186.6666666666, ans=0.125 2023-11-20 04:37:48,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=943186.6666666666, ans=0.015 2023-11-20 04:37:50,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.64 vs. limit=6.0 2023-11-20 04:37:52,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=943253.3333333334, ans=0.125 2023-11-20 04:38:06,676 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 141500 2023-11-20 04:38:11,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=943320.0, ans=0.1 2023-11-20 04:38:19,626 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 9250, loss[loss=0.1104, simple_loss=0.1427, pruned_loss=0.0323, audio_tagging_loss=0.006746, over 15480.00 frames. ], tot_loss[loss=0.08154, simple_loss=0.1018, pruned_loss=0.02073, audio_tagging_loss=0.009913, over 3056988.43 frames. ], batch size: 55, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:38:25,163 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.97 vs. limit=22.5 2023-11-20 04:38:26,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=943386.6666666666, ans=0.2 2023-11-20 04:38:34,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=943453.3333333334, ans=0.0 2023-11-20 04:38:42,394 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.39 vs. limit=15.0 2023-11-20 04:38:45,535 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.16 vs. limit=15.0 2023-11-20 04:39:00,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=943586.6666666666, ans=0.125 2023-11-20 04:39:11,441 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 141550 2023-11-20 04:39:11,954 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2023-11-20 04:39:23,871 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 9300, loss[loss=0.07476, simple_loss=0.08877, pruned_loss=0.019, audio_tagging_loss=0.01138, over 15453.00 frames. ], tot_loss[loss=0.08092, simple_loss=0.1009, pruned_loss=0.02045, audio_tagging_loss=0.009997, over 3050946.84 frames. ], batch size: 58, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:39:25,616 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=22.5 2023-11-20 04:39:30,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=943720.0, ans=0.025 2023-11-20 04:39:32,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=943720.0, ans=0.0 2023-11-20 04:39:32,987 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.82 vs. limit=15.0 2023-11-20 04:39:57,028 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.234e+01 8.210e+01 8.770e+01 9.348e+01 1.167e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 04:39:58,659 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:40:00,307 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=12.0 2023-11-20 04:40:15,589 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 141600 2023-11-20 04:40:24,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=943986.6666666666, ans=0.025 2023-11-20 04:40:28,813 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 9350, loss[loss=0.06514, simple_loss=0.07229, pruned_loss=0.01699, audio_tagging_loss=0.01201, over 16010.00 frames. ], tot_loss[loss=0.08108, simple_loss=0.1009, pruned_loss=0.02057, audio_tagging_loss=0.01006, over 3052299.02 frames. ], batch size: 62, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:40:38,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=944053.3333333334, ans=0.125 2023-11-20 04:40:54,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=944186.6666666666, ans=0.125 2023-11-20 04:41:01,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=944186.6666666666, ans=0.125 2023-11-20 04:41:14,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=944253.3333333334, ans=0.0 2023-11-20 04:41:21,515 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 141650 2023-11-20 04:41:30,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=944320.0, ans=0.0 2023-11-20 04:41:33,795 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 9400, loss[loss=0.08386, simple_loss=0.1031, pruned_loss=0.02276, audio_tagging_loss=0.009581, over 14484.00 frames. ], tot_loss[loss=0.08115, simple_loss=0.1009, pruned_loss=0.02053, audio_tagging_loss=0.01016, over 3047039.82 frames. ], batch size: 52, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:41:58,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=944520.0, ans=0.2 2023-11-20 04:42:04,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=944520.0, ans=0.125 2023-11-20 04:42:05,747 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.961e+01 8.348e+01 8.869e+01 9.935e+01 1.327e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 04:42:06,517 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.55 vs. limit=10.0 2023-11-20 04:42:26,294 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 141700 2023-11-20 04:42:37,249 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:42:38,471 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 9450, loss[loss=0.09292, simple_loss=0.1236, pruned_loss=0.02549, audio_tagging_loss=0.005637, over 14457.00 frames. ], tot_loss[loss=0.08148, simple_loss=0.101, pruned_loss=0.02068, audio_tagging_loss=0.01029, over 3046200.53 frames. ], batch size: 53, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:43:03,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=944786.6666666666, ans=0.0 2023-11-20 04:43:19,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=944920.0, ans=0.0 2023-11-20 04:43:30,207 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 141750 2023-11-20 04:43:42,770 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 9500, loss[loss=0.08864, simple_loss=0.106, pruned_loss=0.02489, audio_tagging_loss=0.01074, over 14375.00 frames. ], tot_loss[loss=0.08126, simple_loss=0.1008, pruned_loss=0.02052, audio_tagging_loss=0.01033, over 3043787.25 frames. ], batch size: 52, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:43:45,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=945053.3333333334, ans=0.1 2023-11-20 04:43:48,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=945053.3333333334, ans=0.125 2023-11-20 04:43:56,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=945120.0, ans=0.0 2023-11-20 04:44:15,579 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.845e+01 8.327e+01 9.041e+01 9.892e+01 1.668e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-20 04:44:23,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=945253.3333333334, ans=0.0 2023-11-20 04:44:26,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=945253.3333333334, ans=0.0 2023-11-20 04:44:28,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=945253.3333333334, ans=0.0 2023-11-20 04:44:34,995 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 141800 2023-11-20 04:44:38,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=945320.0, ans=0.125 2023-11-20 04:44:39,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=945320.0, ans=0.2 2023-11-20 04:44:48,726 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 9550, loss[loss=0.08883, simple_loss=0.1174, pruned_loss=0.01793, audio_tagging_loss=0.0122, over 16687.00 frames. ], tot_loss[loss=0.08103, simple_loss=0.1005, pruned_loss=0.0204, audio_tagging_loss=0.01036, over 3044023.22 frames. ], batch size: 61, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:45:09,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=945453.3333333334, ans=0.125 2023-11-20 04:45:40,910 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 141850 2023-11-20 04:45:53,420 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:45:54,340 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 9600, loss[loss=0.08349, simple_loss=0.109, pruned_loss=0.02211, audio_tagging_loss=0.006873, over 16355.00 frames. ], tot_loss[loss=0.08132, simple_loss=0.1007, pruned_loss=0.02059, audio_tagging_loss=0.01039, over 3041509.93 frames. ], batch size: 59, lr: 5.69e-03, grad_scale: 32.0 2023-11-20 04:46:19,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=945853.3333333334, ans=0.0 2023-11-20 04:46:20,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=945853.3333333334, ans=0.125 2023-11-20 04:46:26,500 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.916e+01 8.231e+01 8.901e+01 9.790e+01 1.400e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-20 04:46:42,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=945920.0, ans=0.125 2023-11-20 04:46:46,224 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 141900 2023-11-20 04:46:47,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=945986.6666666666, ans=0.0 2023-11-20 04:46:57,780 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=22.5 2023-11-20 04:46:58,316 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 9650, loss[loss=0.06579, simple_loss=0.07404, pruned_loss=0.01703, audio_tagging_loss=0.01175, over 14327.00 frames. ], tot_loss[loss=0.08191, simple_loss=0.1016, pruned_loss=0.02085, audio_tagging_loss=0.01029, over 3042987.78 frames. ], batch size: 55, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:47:13,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=946120.0, ans=0.125 2023-11-20 04:47:43,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=946253.3333333334, ans=0.125 2023-11-20 04:47:50,348 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 141950 2023-11-20 04:48:00,453 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.04 vs. limit=8.0 2023-11-20 04:48:03,341 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 9700, loss[loss=0.06305, simple_loss=0.06644, pruned_loss=0.01827, audio_tagging_loss=0.01156, over 15448.00 frames. ], tot_loss[loss=0.08197, simple_loss=0.1018, pruned_loss=0.02087, audio_tagging_loss=0.01019, over 3042609.84 frames. ], batch size: 60, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:48:10,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=946386.6666666666, ans=0.125 2023-11-20 04:48:19,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=946453.3333333334, ans=0.2 2023-11-20 04:48:29,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=946520.0, ans=0.125 2023-11-20 04:48:36,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=946520.0, ans=0.0 2023-11-20 04:48:36,944 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.885e+01 8.155e+01 8.941e+01 9.505e+01 1.207e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 04:48:40,965 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.34 vs. limit=12.0 2023-11-20 04:48:43,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=946586.6666666666, ans=0.125 2023-11-20 04:48:44,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=946586.6666666666, ans=0.125 2023-11-20 04:48:55,406 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 142000 2023-11-20 04:49:02,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=946653.3333333334, ans=0.0 2023-11-20 04:49:08,460 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 9750, loss[loss=0.08138, simple_loss=0.09429, pruned_loss=0.02227, audio_tagging_loss=0.01196, over 14620.00 frames. ], tot_loss[loss=0.0821, simple_loss=0.1022, pruned_loss=0.02093, audio_tagging_loss=0.01004, over 3041969.52 frames. ], batch size: 56, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:49:29,269 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:49:46,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=946920.0, ans=0.125 2023-11-20 04:49:49,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=946920.0, ans=0.0 2023-11-20 04:49:51,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=946920.0, ans=0.0 2023-11-20 04:50:00,575 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 142050 2023-11-20 04:50:12,912 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 9800, loss[loss=0.07573, simple_loss=0.09293, pruned_loss=0.01874, audio_tagging_loss=0.01053, over 14616.00 frames. ], tot_loss[loss=0.08196, simple_loss=0.1023, pruned_loss=0.02084, audio_tagging_loss=0.009993, over 3040393.55 frames. ], batch size: 57, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:50:28,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=947120.0, ans=0.125 2023-11-20 04:50:28,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=947120.0, ans=0.125 2023-11-20 04:50:35,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=947120.0, ans=10.0 2023-11-20 04:50:47,333 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.379e+01 8.314e+01 8.707e+01 9.702e+01 1.155e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 04:50:47,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=947186.6666666666, ans=0.2 2023-11-20 04:50:59,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=947253.3333333334, ans=0.0 2023-11-20 04:51:01,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=947253.3333333334, ans=0.1 2023-11-20 04:51:04,548 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 142100 2023-11-20 04:51:11,768 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:51:17,965 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 9850, loss[loss=0.1052, simple_loss=0.1376, pruned_loss=0.02934, audio_tagging_loss=0.006999, over 15890.00 frames. ], tot_loss[loss=0.08234, simple_loss=0.1028, pruned_loss=0.02107, audio_tagging_loss=0.009878, over 3048589.42 frames. ], batch size: 57, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:51:29,954 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=12.0 2023-11-20 04:51:34,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=947453.3333333334, ans=0.1 2023-11-20 04:51:40,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=947453.3333333334, ans=0.95 2023-11-20 04:51:40,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=947453.3333333334, ans=0.125 2023-11-20 04:51:45,513 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.63 vs. limit=10.0 2023-11-20 04:51:46,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=947520.0, ans=0.125 2023-11-20 04:51:54,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=947520.0, ans=0.1 2023-11-20 04:52:08,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=947653.3333333334, ans=0.0 2023-11-20 04:52:09,660 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 142150 2023-11-20 04:52:13,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=947653.3333333334, ans=0.0 2023-11-20 04:52:19,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=947653.3333333334, ans=0.125 2023-11-20 04:52:22,403 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 9900, loss[loss=0.08812, simple_loss=0.1083, pruned_loss=0.024, audio_tagging_loss=0.009951, over 15552.00 frames. ], tot_loss[loss=0.08254, simple_loss=0.1033, pruned_loss=0.02108, audio_tagging_loss=0.009826, over 3052963.49 frames. ], batch size: 57, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:52:27,306 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=15.0 2023-11-20 04:52:35,902 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=15.0 2023-11-20 04:52:36,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=947786.6666666666, ans=0.0 2023-11-20 04:52:56,049 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.295e+01 8.098e+01 8.892e+01 9.710e+01 1.368e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-20 04:53:11,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=947920.0, ans=0.2 2023-11-20 04:53:14,553 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 142200 2023-11-20 04:53:18,118 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.10 vs. limit=15.0 2023-11-20 04:53:27,279 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 9950, loss[loss=0.06183, simple_loss=0.07484, pruned_loss=0.01126, audio_tagging_loss=0.01315, over 14083.00 frames. ], tot_loss[loss=0.08289, simple_loss=0.1036, pruned_loss=0.02122, audio_tagging_loss=0.009857, over 3056989.73 frames. ], batch size: 55, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:53:47,694 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.15 vs. limit=15.0 2023-11-20 04:53:48,309 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:54:02,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=948186.6666666666, ans=0.125 2023-11-20 04:54:09,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=948253.3333333334, ans=0.125 2023-11-20 04:54:11,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=948253.3333333334, ans=0.125 2023-11-20 04:54:18,898 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 142250 2023-11-20 04:54:31,969 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.39 vs. limit=15.0 2023-11-20 04:54:32,624 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 10000, loss[loss=0.1187, simple_loss=0.1556, pruned_loss=0.0335, audio_tagging_loss=0.00738, over 15296.00 frames. ], tot_loss[loss=0.08218, simple_loss=0.1027, pruned_loss=0.02095, audio_tagging_loss=0.00986, over 3059497.77 frames. ], batch size: 54, lr: 5.69e-03, grad_scale: 32.0 2023-11-20 04:54:39,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=948386.6666666666, ans=0.125 2023-11-20 04:55:03,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=948520.0, ans=0.2 2023-11-20 04:55:05,757 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.530e+01 8.242e+01 9.183e+01 1.026e+02 1.433e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-20 04:55:17,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=948586.6666666666, ans=0.125 2023-11-20 04:55:24,373 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 142300 2023-11-20 04:55:37,135 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 10050, loss[loss=0.09705, simple_loss=0.1236, pruned_loss=0.02428, audio_tagging_loss=0.01099, over 14800.00 frames. ], tot_loss[loss=0.08178, simple_loss=0.1026, pruned_loss=0.02065, audio_tagging_loss=0.009851, over 3057982.28 frames. ], batch size: 54, lr: 5.68e-03, grad_scale: 32.0 2023-11-20 04:55:47,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=948720.0, ans=0.125 2023-11-20 04:55:47,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=948720.0, ans=0.0 2023-11-20 04:55:47,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=948720.0, ans=0.125 2023-11-20 04:55:52,867 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2023-11-20 04:56:07,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=948853.3333333334, ans=0.0 2023-11-20 04:56:28,241 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 142350 2023-11-20 04:56:32,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=948986.6666666666, ans=0.1 2023-11-20 04:56:32,394 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2023-11-20 04:56:37,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=948986.6666666666, ans=0.125 2023-11-20 04:56:40,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=949053.3333333334, ans=0.125 2023-11-20 04:56:41,016 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 10100, loss[loss=0.08507, simple_loss=0.1103, pruned_loss=0.02046, audio_tagging_loss=0.009455, over 15821.00 frames. ], tot_loss[loss=0.08195, simple_loss=0.1028, pruned_loss=0.02066, audio_tagging_loss=0.009905, over 3056494.47 frames. ], batch size: 59, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 04:56:57,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=949120.0, ans=0.0 2023-11-20 04:57:16,170 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.858e+01 8.344e+01 8.796e+01 9.668e+01 1.145e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 04:57:25,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=949253.3333333334, ans=0.125 2023-11-20 04:57:29,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=949253.3333333334, ans=0.2 2023-11-20 04:57:32,845 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:57:32,880 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 142400 2023-11-20 04:57:42,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=949320.0, ans=0.0 2023-11-20 04:57:46,699 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 10150, loss[loss=0.09088, simple_loss=0.1094, pruned_loss=0.02455, audio_tagging_loss=0.01162, over 15580.00 frames. ], tot_loss[loss=0.08214, simple_loss=0.1026, pruned_loss=0.0207, audio_tagging_loss=0.01012, over 3056055.73 frames. ], batch size: 57, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 04:58:00,840 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:58:07,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=949453.3333333334, ans=0.0 2023-11-20 04:58:16,882 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:58:28,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=949586.6666666666, ans=0.0 2023-11-20 04:58:38,515 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 142450 2023-11-20 04:58:39,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=949653.3333333334, ans=0.1 2023-11-20 04:58:41,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=949653.3333333334, ans=0.125 2023-11-20 04:58:50,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=949720.0, ans=0.1 2023-11-20 04:58:51,110 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 10200, loss[loss=0.08804, simple_loss=0.1166, pruned_loss=0.02196, audio_tagging_loss=0.007788, over 15671.00 frames. ], tot_loss[loss=0.08145, simple_loss=0.1017, pruned_loss=0.02043, audio_tagging_loss=0.01019, over 3064410.53 frames. ], batch size: 57, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 04:59:10,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=949786.6666666666, ans=0.2 2023-11-20 04:59:13,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=949786.6666666666, ans=0.09899494936611666 2023-11-20 04:59:15,459 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:59:26,223 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.455e+01 8.468e+01 8.830e+01 9.466e+01 1.234e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-20 04:59:35,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=949920.0, ans=0.125 2023-11-20 04:59:40,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=949920.0, ans=0.07 2023-11-20 04:59:42,868 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 142500 2023-11-20 04:59:45,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=949986.6666666666, ans=0.2 2023-11-20 04:59:54,937 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 10250, loss[loss=0.09079, simple_loss=0.1161, pruned_loss=0.02216, audio_tagging_loss=0.01057, over 15297.00 frames. ], tot_loss[loss=0.0814, simple_loss=0.1015, pruned_loss=0.02041, audio_tagging_loss=0.01022, over 3063500.71 frames. ], batch size: 56, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 05:00:03,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=950053.3333333334, ans=0.2 2023-11-20 05:00:04,960 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.78 vs. limit=10.0 2023-11-20 05:00:17,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=950120.0, ans=0.2 2023-11-20 05:00:35,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=950253.3333333334, ans=0.0 2023-11-20 05:00:46,644 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 142550 2023-11-20 05:01:00,105 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 10300, loss[loss=0.1065, simple_loss=0.1346, pruned_loss=0.03267, audio_tagging_loss=0.006506, over 14511.00 frames. ], tot_loss[loss=0.08141, simple_loss=0.1015, pruned_loss=0.02041, audio_tagging_loss=0.01024, over 3055837.48 frames. ], batch size: 54, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 05:01:21,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=950453.3333333334, ans=0.0 2023-11-20 05:01:33,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=950520.0, ans=0.125 2023-11-20 05:01:34,328 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.101e+01 8.687e+01 9.429e+01 1.201e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-20 05:01:39,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=950586.6666666666, ans=0.125 2023-11-20 05:01:42,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=950586.6666666666, ans=0.1 2023-11-20 05:01:49,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=950586.6666666666, ans=0.1 2023-11-20 05:01:52,448 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 142600 2023-11-20 05:02:04,852 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 10350, loss[loss=0.07637, simple_loss=0.0953, pruned_loss=0.01838, audio_tagging_loss=0.01034, over 14841.00 frames. ], tot_loss[loss=0.08163, simple_loss=0.1018, pruned_loss=0.02044, audio_tagging_loss=0.01028, over 3046999.01 frames. ], batch size: 56, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 05:02:09,616 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=22.5 2023-11-20 05:02:11,078 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=15.0 2023-11-20 05:02:14,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=950720.0, ans=0.125 2023-11-20 05:02:36,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=950853.3333333334, ans=0.2 2023-11-20 05:02:41,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=950853.3333333334, ans=0.0 2023-11-20 05:02:47,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=950920.0, ans=0.0 2023-11-20 05:02:47,792 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2023-11-20 05:02:57,437 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 142650 2023-11-20 05:03:09,693 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 10400, loss[loss=0.07282, simple_loss=0.1002, pruned_loss=0.01516, audio_tagging_loss=0.00758, over 14771.00 frames. ], tot_loss[loss=0.08225, simple_loss=0.1026, pruned_loss=0.02065, audio_tagging_loss=0.01029, over 3044351.54 frames. ], batch size: 56, lr: 5.68e-03, grad_scale: 32.0 2023-11-20 05:03:42,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=951186.6666666666, ans=0.0 2023-11-20 05:03:45,417 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.768e+01 8.746e+01 9.185e+01 9.994e+01 1.378e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-20 05:03:55,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=951253.3333333334, ans=0.125 2023-11-20 05:04:01,617 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 142700 2023-11-20 05:04:02,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.17 vs. limit=15.0 2023-11-20 05:04:14,418 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 10450, loss[loss=0.06688, simple_loss=0.07667, pruned_loss=0.0182, audio_tagging_loss=0.01035, over 15241.00 frames. ], tot_loss[loss=0.08171, simple_loss=0.1019, pruned_loss=0.0205, audio_tagging_loss=0.01023, over 3047880.04 frames. ], batch size: 60, lr: 5.68e-03, grad_scale: 32.0 2023-11-20 05:04:25,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=951386.6666666666, ans=0.1 2023-11-20 05:05:06,119 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 142750 2023-11-20 05:05:08,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=951653.3333333334, ans=0.1 2023-11-20 05:05:18,692 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 10500, loss[loss=0.08285, simple_loss=0.1095, pruned_loss=0.01925, audio_tagging_loss=0.008836, over 15805.00 frames. ], tot_loss[loss=0.08123, simple_loss=0.1013, pruned_loss=0.02037, audio_tagging_loss=0.01021, over 3044868.44 frames. ], batch size: 59, lr: 5.68e-03, grad_scale: 32.0 2023-11-20 05:05:25,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=951720.0, ans=0.2 2023-11-20 05:05:32,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=951786.6666666666, ans=0.2 2023-11-20 05:05:39,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=951786.6666666666, ans=0.125 2023-11-20 05:05:52,960 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.079e+01 8.976e+01 9.766e+01 1.332e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-20 05:06:01,652 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.23 vs. limit=15.0 2023-11-20 05:06:10,124 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 142800 2023-11-20 05:06:22,647 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 10550, loss[loss=0.06452, simple_loss=0.08112, pruned_loss=0.01248, audio_tagging_loss=0.01148, over 16191.00 frames. ], tot_loss[loss=0.08093, simple_loss=0.1009, pruned_loss=0.02035, audio_tagging_loss=0.01013, over 3042732.09 frames. ], batch size: 61, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:06:23,286 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.96 vs. limit=22.5 2023-11-20 05:06:25,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=952053.3333333334, ans=0.0 2023-11-20 05:06:26,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=952053.3333333334, ans=0.0 2023-11-20 05:06:33,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=952120.0, ans=0.125 2023-11-20 05:06:33,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=952120.0, ans=0.0 2023-11-20 05:06:33,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=952120.0, ans=0.125 2023-11-20 05:06:56,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=952186.6666666666, ans=0.0 2023-11-20 05:07:14,184 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 142850 2023-11-20 05:07:24,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=952320.0, ans=0.125 2023-11-20 05:07:26,291 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 10600, loss[loss=0.108, simple_loss=0.1375, pruned_loss=0.03117, audio_tagging_loss=0.008015, over 16161.00 frames. ], tot_loss[loss=0.08177, simple_loss=0.102, pruned_loss=0.02071, audio_tagging_loss=0.01008, over 3046300.02 frames. ], batch size: 56, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:07:33,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=952386.6666666666, ans=0.1 2023-11-20 05:07:45,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=952453.3333333334, ans=0.0 2023-11-20 05:07:50,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=952453.3333333334, ans=0.125 2023-11-20 05:08:01,781 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.830e+01 8.408e+01 9.197e+01 1.017e+02 1.438e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-20 05:08:18,720 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 142900 2023-11-20 05:08:24,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=952653.3333333334, ans=0.125 2023-11-20 05:08:31,756 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 10650, loss[loss=0.07836, simple_loss=0.09581, pruned_loss=0.02129, audio_tagging_loss=0.009163, over 16260.00 frames. ], tot_loss[loss=0.08133, simple_loss=0.1018, pruned_loss=0.02036, audio_tagging_loss=0.01007, over 3051604.27 frames. ], batch size: 61, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:08:32,555 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.12 vs. limit=10.0 2023-11-20 05:08:34,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=952720.0, ans=0.0 2023-11-20 05:08:36,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=952720.0, ans=0.125 2023-11-20 05:09:03,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=952853.3333333334, ans=0.035 2023-11-20 05:09:11,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=952920.0, ans=0.125 2023-11-20 05:09:23,789 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 142950 2023-11-20 05:09:36,520 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 10700, loss[loss=0.1059, simple_loss=0.1331, pruned_loss=0.0297, audio_tagging_loss=0.009714, over 15206.00 frames. ], tot_loss[loss=0.08232, simple_loss=0.103, pruned_loss=0.02071, audio_tagging_loss=0.01011, over 3051198.31 frames. ], batch size: 56, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:09:46,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=953053.3333333334, ans=0.0 2023-11-20 05:10:11,510 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.004e+01 8.248e+01 8.993e+01 9.641e+01 1.206e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-20 05:10:22,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=953253.3333333334, ans=0.09899494936611666 2023-11-20 05:10:22,742 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.21 vs. limit=15.0 2023-11-20 05:10:24,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=953253.3333333334, ans=0.1 2023-11-20 05:10:24,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=953253.3333333334, ans=0.125 2023-11-20 05:10:28,374 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 143000 2023-11-20 05:10:39,193 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.18 vs. limit=15.0 2023-11-20 05:10:40,856 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 10750, loss[loss=0.06658, simple_loss=0.07327, pruned_loss=0.01845, audio_tagging_loss=0.01149, over 16403.00 frames. ], tot_loss[loss=0.08227, simple_loss=0.1027, pruned_loss=0.0208, audio_tagging_loss=0.01012, over 3051555.90 frames. ], batch size: 65, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:10:42,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=953386.6666666666, ans=0.04949747468305833 2023-11-20 05:10:52,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=22.5 2023-11-20 05:11:00,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=953453.3333333334, ans=0.125 2023-11-20 05:11:08,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=953520.0, ans=0.125 2023-11-20 05:11:08,734 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2023-11-20 05:11:29,715 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.98 vs. limit=10.0 2023-11-20 05:11:32,691 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 143050 2023-11-20 05:11:42,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=953653.3333333334, ans=0.0 2023-11-20 05:11:45,909 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 10800, loss[loss=0.09037, simple_loss=0.1194, pruned_loss=0.02258, audio_tagging_loss=0.00808, over 14897.00 frames. ], tot_loss[loss=0.0819, simple_loss=0.1026, pruned_loss=0.02063, audio_tagging_loss=0.009994, over 3054257.65 frames. ], batch size: 55, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:11:47,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=953720.0, ans=0.07 2023-11-20 05:11:59,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=953786.6666666666, ans=0.125 2023-11-20 05:12:09,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=953786.6666666666, ans=0.0 2023-11-20 05:12:10,247 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2023-11-20 05:12:14,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=953853.3333333334, ans=0.5 2023-11-20 05:12:14,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=953853.3333333334, ans=0.125 2023-11-20 05:12:19,868 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.79 vs. limit=10.0 2023-11-20 05:12:20,205 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.594e+01 7.886e+01 8.544e+01 9.371e+01 1.667e+02, threshold=1.709e+02, percent-clipped=0.0 2023-11-20 05:12:28,897 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.20 vs. limit=15.0 2023-11-20 05:12:37,704 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 143100 2023-11-20 05:12:41,356 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2023-11-20 05:12:41,484 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.10 vs. limit=12.0 2023-11-20 05:12:43,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=953986.6666666666, ans=0.0 2023-11-20 05:12:50,255 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 10850, loss[loss=0.08769, simple_loss=0.1103, pruned_loss=0.01972, audio_tagging_loss=0.01284, over 15032.00 frames. ], tot_loss[loss=0.08231, simple_loss=0.1028, pruned_loss=0.02087, audio_tagging_loss=0.01002, over 3057334.09 frames. ], batch size: 57, lr: 5.67e-03, grad_scale: 16.0 2023-11-20 05:12:59,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=954053.3333333334, ans=0.2 2023-11-20 05:13:02,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=954120.0, ans=0.0 2023-11-20 05:13:20,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=954186.6666666666, ans=0.1 2023-11-20 05:13:38,500 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.48 vs. limit=15.0 2023-11-20 05:13:40,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=954320.0, ans=0.2 2023-11-20 05:13:41,485 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 143150 2023-11-20 05:13:50,147 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 05:13:53,686 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 10900, loss[loss=0.1117, simple_loss=0.1381, pruned_loss=0.03571, audio_tagging_loss=0.006971, over 15114.00 frames. ], tot_loss[loss=0.08197, simple_loss=0.1025, pruned_loss=0.02068, audio_tagging_loss=0.01006, over 3056961.72 frames. ], batch size: 56, lr: 5.67e-03, grad_scale: 16.0 2023-11-20 05:13:55,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=22.5 2023-11-20 05:14:03,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=954386.6666666666, ans=0.125 2023-11-20 05:14:07,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=954453.3333333334, ans=0.125 2023-11-20 05:14:11,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=954453.3333333334, ans=0.2 2023-11-20 05:14:16,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=954453.3333333334, ans=0.125 2023-11-20 05:14:16,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=954453.3333333334, ans=0.2 2023-11-20 05:14:19,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=954520.0, ans=0.2 2023-11-20 05:14:30,509 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.839e+01 8.320e+01 9.175e+01 1.028e+02 1.481e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-20 05:14:33,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=954586.6666666666, ans=0.0 2023-11-20 05:14:45,534 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 143200 2023-11-20 05:14:55,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=954653.3333333334, ans=0.0 2023-11-20 05:14:59,425 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 10950, loss[loss=0.09313, simple_loss=0.1169, pruned_loss=0.02551, audio_tagging_loss=0.009192, over 14945.00 frames. ], tot_loss[loss=0.08237, simple_loss=0.1028, pruned_loss=0.02088, audio_tagging_loss=0.01008, over 3055327.50 frames. ], batch size: 56, lr: 5.67e-03, grad_scale: 16.0 2023-11-20 05:15:02,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=954720.0, ans=0.1 2023-11-20 05:15:04,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=954720.0, ans=0.0 2023-11-20 05:15:10,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=954720.0, ans=0.125 2023-11-20 05:15:17,728 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.54 vs. limit=15.0 2023-11-20 05:15:20,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=954786.6666666666, ans=0.1 2023-11-20 05:15:27,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=954853.3333333334, ans=0.125 2023-11-20 05:15:35,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=954853.3333333334, ans=0.125 2023-11-20 05:15:41,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=954920.0, ans=0.125 2023-11-20 05:15:43,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=954920.0, ans=0.0 2023-11-20 05:15:46,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=954920.0, ans=0.125 2023-11-20 05:15:49,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=954920.0, ans=0.2 2023-11-20 05:15:51,648 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 143250 2023-11-20 05:15:58,844 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.71 vs. limit=15.0 2023-11-20 05:16:04,523 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 11000, loss[loss=0.07414, simple_loss=0.09701, pruned_loss=0.01679, audio_tagging_loss=0.008849, over 15448.00 frames. ], tot_loss[loss=0.08221, simple_loss=0.1025, pruned_loss=0.02085, audio_tagging_loss=0.01012, over 3055049.63 frames. ], batch size: 55, lr: 5.67e-03, grad_scale: 16.0 2023-11-20 05:16:09,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=955053.3333333334, ans=0.125 2023-11-20 05:16:14,979 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 05:16:24,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=955120.0, ans=0.125 2023-11-20 05:16:40,609 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.224e+01 8.941e+01 1.006e+02 1.362e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 05:16:52,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=955253.3333333334, ans=0.1 2023-11-20 05:16:56,674 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 143300 2023-11-20 05:17:08,885 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 11050, loss[loss=0.06911, simple_loss=0.08447, pruned_loss=0.01487, audio_tagging_loss=0.01201, over 15123.00 frames. ], tot_loss[loss=0.08206, simple_loss=0.102, pruned_loss=0.02079, audio_tagging_loss=0.01026, over 3049500.70 frames. ], batch size: 58, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:17:39,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=955520.0, ans=0.125 2023-11-20 05:17:44,841 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.88 vs. limit=22.5 2023-11-20 05:17:56,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=955586.6666666666, ans=0.1 2023-11-20 05:17:59,210 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.77 vs. limit=15.0 2023-11-20 05:18:00,885 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 143350 2023-11-20 05:18:02,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=955653.3333333334, ans=0.125 2023-11-20 05:18:11,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=955653.3333333334, ans=0.1 2023-11-20 05:18:14,393 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 11100, loss[loss=0.07312, simple_loss=0.09197, pruned_loss=0.01702, audio_tagging_loss=0.01011, over 15772.00 frames. ], tot_loss[loss=0.08199, simple_loss=0.1016, pruned_loss=0.02084, audio_tagging_loss=0.01034, over 3045158.52 frames. ], batch size: 62, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:18:24,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=955720.0, ans=0.1 2023-11-20 05:18:39,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.94 vs. limit=22.5 2023-11-20 05:18:41,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=955853.3333333334, ans=0.07 2023-11-20 05:18:49,756 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 8.290e+01 8.982e+01 9.769e+01 2.008e+02, threshold=1.796e+02, percent-clipped=1.0 2023-11-20 05:18:51,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=955920.0, ans=0.125 2023-11-20 05:18:59,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.17 vs. limit=15.0 2023-11-20 05:19:05,657 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 143400 2023-11-20 05:19:18,880 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 11150, loss[loss=0.06096, simple_loss=0.07238, pruned_loss=0.01154, audio_tagging_loss=0.01323, over 15413.00 frames. ], tot_loss[loss=0.08151, simple_loss=0.101, pruned_loss=0.02053, audio_tagging_loss=0.01048, over 3050497.12 frames. ], batch size: 60, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:19:26,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=956053.3333333334, ans=0.0 2023-11-20 05:19:30,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=956120.0, ans=0.125 2023-11-20 05:19:39,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=956120.0, ans=0.125 2023-11-20 05:19:48,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=956186.6666666666, ans=0.95 2023-11-20 05:19:48,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=956186.6666666666, ans=0.125 2023-11-20 05:19:50,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=956186.6666666666, ans=0.2 2023-11-20 05:19:53,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=956186.6666666666, ans=0.0 2023-11-20 05:19:53,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=956186.6666666666, ans=0.1 2023-11-20 05:19:56,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=956253.3333333334, ans=0.125 2023-11-20 05:19:58,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=956253.3333333334, ans=0.2 2023-11-20 05:20:10,313 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 143450 2023-11-20 05:20:23,239 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 11200, loss[loss=0.0574, simple_loss=0.06858, pruned_loss=0.01124, audio_tagging_loss=0.01188, over 15052.00 frames. ], tot_loss[loss=0.08106, simple_loss=0.1005, pruned_loss=0.02037, audio_tagging_loss=0.01046, over 3049652.40 frames. ], batch size: 58, lr: 5.66e-03, grad_scale: 32.0 2023-11-20 05:20:30,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=956386.6666666666, ans=0.125 2023-11-20 05:20:59,540 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.186e+01 8.107e+01 8.782e+01 9.452e+01 1.606e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 05:21:13,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=956586.6666666666, ans=12.0 2023-11-20 05:21:14,931 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 143500 2023-11-20 05:21:27,542 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 11250, loss[loss=0.1051, simple_loss=0.1428, pruned_loss=0.02621, audio_tagging_loss=0.007508, over 14396.00 frames. ], tot_loss[loss=0.08062, simple_loss=0.09971, pruned_loss=0.02024, audio_tagging_loss=0.01052, over 3046527.78 frames. ], batch size: 54, lr: 5.66e-03, grad_scale: 32.0 2023-11-20 05:21:45,540 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=22.5 2023-11-20 05:21:52,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=956853.3333333334, ans=0.1 2023-11-20 05:22:06,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=956920.0, ans=0.1 2023-11-20 05:22:13,606 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=12.0 2023-11-20 05:22:19,693 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 143550 2023-11-20 05:22:31,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=957053.3333333334, ans=0.0 2023-11-20 05:22:32,442 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 11300, loss[loss=0.07946, simple_loss=0.1067, pruned_loss=0.01922, audio_tagging_loss=0.006894, over 15093.00 frames. ], tot_loss[loss=0.08028, simple_loss=0.09959, pruned_loss=0.02012, audio_tagging_loss=0.01037, over 3042143.96 frames. ], batch size: 56, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:22:41,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=957053.3333333334, ans=0.0 2023-11-20 05:22:45,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=957120.0, ans=0.125 2023-11-20 05:22:46,093 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=15.0 2023-11-20 05:22:52,805 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.74 vs. limit=22.5 2023-11-20 05:23:10,355 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.444e+01 8.144e+01 8.778e+01 9.554e+01 1.373e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 05:23:23,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=957320.0, ans=0.125 2023-11-20 05:23:24,894 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 143600 2023-11-20 05:23:26,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2023-11-20 05:23:37,258 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 11350, loss[loss=0.07296, simple_loss=0.09976, pruned_loss=0.01414, audio_tagging_loss=0.008935, over 15426.00 frames. ], tot_loss[loss=0.08001, simple_loss=0.09948, pruned_loss=0.02004, audio_tagging_loss=0.01023, over 3041754.88 frames. ], batch size: 57, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:23:38,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=957386.6666666666, ans=0.0 2023-11-20 05:23:46,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=957386.6666666666, ans=0.125 2023-11-20 05:23:53,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2023-11-20 05:24:29,261 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 143650 2023-11-20 05:24:42,579 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 11400, loss[loss=0.08814, simple_loss=0.09434, pruned_loss=0.03057, audio_tagging_loss=0.0104, over 13601.00 frames. ], tot_loss[loss=0.08027, simple_loss=0.09994, pruned_loss=0.0202, audio_tagging_loss=0.0101, over 3041213.33 frames. ], batch size: 54, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:24:59,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=957786.6666666666, ans=0.125 2023-11-20 05:24:59,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=957786.6666666666, ans=0.125 2023-11-20 05:25:19,460 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.590e+01 8.041e+01 8.720e+01 9.566e+01 1.309e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-20 05:25:32,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=957920.0, ans=0.5 2023-11-20 05:25:34,554 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 143700 2023-11-20 05:25:39,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=957986.6666666666, ans=0.125 2023-11-20 05:25:47,457 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 11450, loss[loss=0.06088, simple_loss=0.08246, pruned_loss=0.01218, audio_tagging_loss=0.007468, over 14876.00 frames. ], tot_loss[loss=0.08015, simple_loss=0.1, pruned_loss=0.02011, audio_tagging_loss=0.01002, over 3046398.59 frames. ], batch size: 57, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:26:03,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=958120.0, ans=0.95 2023-11-20 05:26:03,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=958120.0, ans=0.125 2023-11-20 05:26:04,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=958120.0, ans=0.1 2023-11-20 05:26:38,712 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 143750 2023-11-20 05:26:51,457 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 11500, loss[loss=0.1012, simple_loss=0.1287, pruned_loss=0.02736, audio_tagging_loss=0.009515, over 15954.00 frames. ], tot_loss[loss=0.08045, simple_loss=0.1003, pruned_loss=0.0203, audio_tagging_loss=0.009974, over 3043251.14 frames. ], batch size: 58, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:27:01,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=958386.6666666666, ans=0.0 2023-11-20 05:27:07,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=958453.3333333334, ans=0.125 2023-11-20 05:27:12,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=958453.3333333334, ans=0.125 2023-11-20 05:27:22,995 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.88 vs. limit=15.0 2023-11-20 05:27:29,101 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.118e+01 7.975e+01 8.821e+01 9.410e+01 1.240e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-20 05:27:29,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=958586.6666666666, ans=0.0 2023-11-20 05:27:42,784 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 143800 2023-11-20 05:27:56,041 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 11550, loss[loss=0.06555, simple_loss=0.07792, pruned_loss=0.01604, audio_tagging_loss=0.01055, over 15608.00 frames. ], tot_loss[loss=0.08087, simple_loss=0.1009, pruned_loss=0.02046, audio_tagging_loss=0.009952, over 3043466.66 frames. ], batch size: 59, lr: 5.65e-03, grad_scale: 16.0 2023-11-20 05:27:56,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=958720.0, ans=0.125 2023-11-20 05:27:57,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=958720.0, ans=0.0 2023-11-20 05:28:08,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=958786.6666666666, ans=0.2 2023-11-20 05:28:16,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=958786.6666666666, ans=0.2 2023-11-20 05:28:26,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=958853.3333333334, ans=0.5 2023-11-20 05:28:36,099 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 05:28:44,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=958920.0, ans=0.125 2023-11-20 05:28:48,358 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 143850 2023-11-20 05:28:48,752 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2023-11-20 05:28:54,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=958986.6666666666, ans=0.125 2023-11-20 05:28:58,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=958986.6666666666, ans=0.125 2023-11-20 05:29:01,022 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 11600, loss[loss=0.0726, simple_loss=0.08918, pruned_loss=0.01759, audio_tagging_loss=0.01042, over 15442.00 frames. ], tot_loss[loss=0.08107, simple_loss=0.1013, pruned_loss=0.02042, audio_tagging_loss=0.009985, over 3043875.34 frames. ], batch size: 57, lr: 5.65e-03, grad_scale: 32.0 2023-11-20 05:29:11,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=959053.3333333334, ans=0.1 2023-11-20 05:29:19,582 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.24 vs. limit=22.5 2023-11-20 05:29:38,415 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.843e+01 8.207e+01 8.809e+01 9.519e+01 1.148e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-20 05:29:52,820 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 143900 2023-11-20 05:30:05,598 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 11650, loss[loss=0.08623, simple_loss=0.1166, pruned_loss=0.01992, audio_tagging_loss=0.008015, over 15305.00 frames. ], tot_loss[loss=0.08171, simple_loss=0.1024, pruned_loss=0.02056, audio_tagging_loss=0.009939, over 3042285.71 frames. ], batch size: 55, lr: 5.65e-03, grad_scale: 32.0 2023-11-20 05:30:18,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=959453.3333333334, ans=0.025 2023-11-20 05:30:27,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=22.5 2023-11-20 05:30:57,358 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 143950 2023-11-20 05:30:59,332 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.95 vs. limit=10.0 2023-11-20 05:31:09,581 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 11700, loss[loss=0.07972, simple_loss=0.09205, pruned_loss=0.02339, audio_tagging_loss=0.0103, over 14323.00 frames. ], tot_loss[loss=0.0806, simple_loss=0.1007, pruned_loss=0.02019, audio_tagging_loss=0.01008, over 3034447.02 frames. ], batch size: 56, lr: 5.65e-03, grad_scale: 32.0 2023-11-20 05:31:10,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=959720.0, ans=0.0 2023-11-20 05:31:24,806 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.91 vs. limit=15.0 2023-11-20 05:31:25,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=959786.6666666666, ans=0.125 2023-11-20 05:31:26,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=959786.6666666666, ans=0.125 2023-11-20 05:31:30,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=959786.6666666666, ans=0.125 2023-11-20 05:31:46,954 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.116e+01 8.169e+01 8.905e+01 9.543e+01 1.392e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-20 05:31:53,686 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.05 vs. limit=12.0 2023-11-20 05:32:00,431 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 144000 2023-11-20 05:32:06,554 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2023-11-20 05:32:12,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=959986.6666666666, ans=0.125 2023-11-20 05:32:17,066 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 11750, loss[loss=0.06155, simple_loss=0.0695, pruned_loss=0.01507, audio_tagging_loss=0.01173, over 15135.00 frames. ], tot_loss[loss=0.08032, simple_loss=0.1001, pruned_loss=0.02018, audio_tagging_loss=0.01008, over 3033692.40 frames. ], batch size: 57, lr: 5.65e-03, grad_scale: 32.0 2023-11-20 05:32:33,172 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=12.0 2023-11-20 05:32:37,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=960120.0, ans=0.125 2023-11-20 05:32:45,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=960186.6666666666, ans=0.125 2023-11-20 05:32:57,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2023-11-20 05:33:08,893 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 144050 2023-11-20 05:33:09,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=960320.0, ans=0.125 2023-11-20 05:33:21,618 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 11800, loss[loss=0.06637, simple_loss=0.08129, pruned_loss=0.01628, audio_tagging_loss=0.009445, over 16557.00 frames. ], tot_loss[loss=0.07988, simple_loss=0.09922, pruned_loss=0.02003, audio_tagging_loss=0.01025, over 3029446.03 frames. ], batch size: 64, lr: 5.65e-03, grad_scale: 16.0 2023-11-20 05:33:59,959 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.293e+01 8.347e+01 8.824e+01 9.407e+01 1.316e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-20 05:34:00,321 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:34:04,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=960586.6666666666, ans=0.015 2023-11-20 05:34:08,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=960586.6666666666, ans=0.2 2023-11-20 05:34:12,991 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 144100 2023-11-20 05:34:13,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=960653.3333333334, ans=0.125 2023-11-20 05:34:14,824 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2023-11-20 05:34:19,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=960653.3333333334, ans=0.05 2023-11-20 05:34:25,227 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 11850, loss[loss=0.08786, simple_loss=0.1148, pruned_loss=0.02196, audio_tagging_loss=0.008495, over 16591.00 frames. ], tot_loss[loss=0.08051, simple_loss=0.1001, pruned_loss=0.02015, audio_tagging_loss=0.01031, over 3034572.67 frames. ], batch size: 62, lr: 5.65e-03, grad_scale: 16.0 2023-11-20 05:34:29,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=960720.0, ans=0.1 2023-11-20 05:34:46,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=960786.6666666666, ans=0.125 2023-11-20 05:34:48,428 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.96 vs. limit=12.0 2023-11-20 05:35:16,589 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 144150 2023-11-20 05:35:29,876 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 11900, loss[loss=0.09934, simple_loss=0.1297, pruned_loss=0.02565, audio_tagging_loss=0.008838, over 15651.00 frames. ], tot_loss[loss=0.08124, simple_loss=0.1013, pruned_loss=0.02027, audio_tagging_loss=0.01033, over 3037904.35 frames. ], batch size: 56, lr: 5.65e-03, grad_scale: 16.0 2023-11-20 05:35:33,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=961053.3333333334, ans=0.125 2023-11-20 05:35:35,771 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2023-11-20 05:35:46,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=961120.0, ans=0.1 2023-11-20 05:35:54,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=961186.6666666666, ans=0.125 2023-11-20 05:36:07,723 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.202e+01 8.808e+01 9.616e+01 1.545e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-20 05:36:08,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=961253.3333333334, ans=0.125 2023-11-20 05:36:21,918 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 144200 2023-11-20 05:36:35,082 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 11950, loss[loss=0.1031, simple_loss=0.1302, pruned_loss=0.0302, audio_tagging_loss=0.007791, over 15448.00 frames. ], tot_loss[loss=0.08163, simple_loss=0.1015, pruned_loss=0.02048, audio_tagging_loss=0.01039, over 3039988.10 frames. ], batch size: 58, lr: 5.65e-03, grad_scale: 16.0 2023-11-20 05:36:45,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=961386.6666666666, ans=0.1 2023-11-20 05:37:05,438 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:37:17,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=961586.6666666666, ans=0.125 2023-11-20 05:37:20,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=961586.6666666666, ans=0.125 2023-11-20 05:37:25,344 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 144250 2023-11-20 05:37:25,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=961653.3333333334, ans=0.125 2023-11-20 05:37:30,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=961653.3333333334, ans=0.125 2023-11-20 05:37:34,426 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.192e-03 2023-11-20 05:37:37,574 INFO [train_asr.py:1262] (3/4) Epoch 12, batch 12000, loss[loss=0.07416, simple_loss=0.08275, pruned_loss=0.02022, audio_tagging_loss=0.01257, over 15330.00 frames. ], tot_loss[loss=0.08189, simple_loss=0.1017, pruned_loss=0.02055, audio_tagging_loss=0.01049, over 3041539.21 frames. ], batch size: 57, lr: 5.65e-03, grad_scale: 32.0 2023-11-20 05:37:37,575 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-20 05:38:02,863 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.5373, 3.6585, 4.2774, 3.4818], device='cuda:3') 2023-11-20 05:38:18,776 INFO [train_asr.py:1294] (3/4) Epoch 12, validation: loss=0.06309, simple_loss=0.0542, pruned_loss=0.005937, audio_tagging_loss=0.03005, over 4681554.00 frames. 2023-11-20 05:38:18,777 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-20 05:38:21,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=961720.0, ans=0.0 2023-11-20 05:38:23,727 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:38:39,389 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=12.0 2023-11-20 05:39:27,279 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 0, loss[loss=0.07051, simple_loss=0.07517, pruned_loss=0.01061, audio_tagging_loss=0.02232, over 14017.00 frames. ], tot_loss[loss=0.07051, simple_loss=0.07517, pruned_loss=0.01061, audio_tagging_loss=0.02232, over 14017.00 frames. ], batch size: 55, lr: 5.43e-03, grad_scale: 32.0 2023-11-20 05:39:27,279 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-20 05:40:04,320 INFO [train_asr.py:1294] (3/4) Epoch 13, validation: loss=0.06272, simple_loss=0.05429, pruned_loss=0.006071, audio_tagging_loss=0.02951, over 4681554.00 frames. 2023-11-20 05:40:04,321 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-20 05:40:04,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=961886.6666666666, ans=0.125 2023-11-20 05:40:10,454 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.395e+01 8.159e+01 8.856e+01 9.666e+01 1.294e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-20 05:40:12,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=961886.6666666666, ans=0.125 2023-11-20 05:40:23,131 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 144300 2023-11-20 05:40:42,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=962086.6666666666, ans=0.125 2023-11-20 05:40:54,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=962086.6666666666, ans=0.1 2023-11-20 05:41:09,223 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 50, loss[loss=0.06617, simple_loss=0.07222, pruned_loss=0.00955, audio_tagging_loss=0.02051, over 14966.00 frames. ], tot_loss[loss=0.09017, simple_loss=0.1007, pruned_loss=0.01988, audio_tagging_loss=0.01992, over 685457.44 frames. ], batch size: 56, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:41:13,116 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.073e-01 2023-11-20 05:41:15,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=962220.0, ans=0.1 2023-11-20 05:41:19,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=962220.0, ans=0.0 2023-11-20 05:41:28,949 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 144350 2023-11-20 05:41:46,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=962420.0, ans=0.0 2023-11-20 05:41:51,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=962420.0, ans=0.2 2023-11-20 05:42:00,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=962486.6666666666, ans=0.1 2023-11-20 05:42:04,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=962486.6666666666, ans=0.0 2023-11-20 05:42:12,683 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 100, loss[loss=0.09913, simple_loss=0.1145, pruned_loss=0.026, audio_tagging_loss=0.0159, over 14052.00 frames. ], tot_loss[loss=0.08977, simple_loss=0.1012, pruned_loss=0.02004, audio_tagging_loss=0.01915, over 1205728.04 frames. ], batch size: 57, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:42:18,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=962553.3333333334, ans=0.0 2023-11-20 05:42:21,356 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 8.954e+01 9.518e+01 1.027e+02 1.327e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-20 05:42:22,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=962553.3333333334, ans=0.125 2023-11-20 05:42:33,889 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 144400 2023-11-20 05:42:38,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=962620.0, ans=0.2 2023-11-20 05:42:44,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=962686.6666666666, ans=0.125 2023-11-20 05:42:45,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=962686.6666666666, ans=0.2 2023-11-20 05:42:45,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=962686.6666666666, ans=0.0 2023-11-20 05:42:49,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=962686.6666666666, ans=0.0 2023-11-20 05:42:55,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=962753.3333333334, ans=0.2 2023-11-20 05:43:00,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=962753.3333333334, ans=0.2 2023-11-20 05:43:03,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=962753.3333333334, ans=0.125 2023-11-20 05:43:04,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=962820.0, ans=0.125 2023-11-20 05:43:08,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=962820.0, ans=0.125 2023-11-20 05:43:11,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=962820.0, ans=0.125 2023-11-20 05:43:14,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=962820.0, ans=0.125 2023-11-20 05:43:18,753 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 150, loss[loss=0.08514, simple_loss=0.1009, pruned_loss=0.0211, audio_tagging_loss=0.01359, over 15484.00 frames. ], tot_loss[loss=0.0879, simple_loss=0.1014, pruned_loss=0.02033, audio_tagging_loss=0.01689, over 1613334.06 frames. ], batch size: 60, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:43:38,556 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 144450 2023-11-20 05:43:38,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=962953.3333333334, ans=0.05 2023-11-20 05:43:39,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=962953.3333333334, ans=0.125 2023-11-20 05:43:41,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=962953.3333333334, ans=0.1 2023-11-20 05:43:47,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=963020.0, ans=0.125 2023-11-20 05:44:12,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=963153.3333333334, ans=0.025 2023-11-20 05:44:19,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=963153.3333333334, ans=0.125 2023-11-20 05:44:24,344 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 200, loss[loss=0.09888, simple_loss=0.1375, pruned_loss=0.02339, audio_tagging_loss=0.00675, over 14667.00 frames. ], tot_loss[loss=0.08604, simple_loss=0.1014, pruned_loss=0.02045, audio_tagging_loss=0.01492, over 1927136.49 frames. ], batch size: 53, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:44:28,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=963220.0, ans=0.125 2023-11-20 05:44:31,998 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 8.317e+01 9.168e+01 9.939e+01 1.407e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-20 05:44:35,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=963286.6666666666, ans=0.2 2023-11-20 05:44:39,623 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:44:43,890 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 144500 2023-11-20 05:45:21,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=963486.6666666666, ans=0.1 2023-11-20 05:45:24,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.56 vs. limit=15.0 2023-11-20 05:45:27,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=963553.3333333334, ans=0.125 2023-11-20 05:45:28,836 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 250, loss[loss=0.05954, simple_loss=0.06366, pruned_loss=0.01397, audio_tagging_loss=0.01374, over 15931.00 frames. ], tot_loss[loss=0.0862, simple_loss=0.1034, pruned_loss=0.02108, audio_tagging_loss=0.0134, over 2179213.09 frames. ], batch size: 63, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:45:38,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=963553.3333333334, ans=0.125 2023-11-20 05:45:49,350 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 144550 2023-11-20 05:45:57,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=963686.6666666666, ans=0.125 2023-11-20 05:46:01,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=963686.6666666666, ans=0.0 2023-11-20 05:46:01,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=963686.6666666666, ans=0.1 2023-11-20 05:46:01,604 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2023-11-20 05:46:10,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=963753.3333333334, ans=0.125 2023-11-20 05:46:24,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.11 vs. limit=6.0 2023-11-20 05:46:30,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=963820.0, ans=0.125 2023-11-20 05:46:34,392 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 300, loss[loss=0.08605, simple_loss=0.1118, pruned_loss=0.0195, audio_tagging_loss=0.01066, over 15209.00 frames. ], tot_loss[loss=0.08455, simple_loss=0.1024, pruned_loss=0.02088, audio_tagging_loss=0.01249, over 2371477.18 frames. ], batch size: 55, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:46:42,583 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.680e+01 8.488e+01 9.150e+01 9.824e+01 1.478e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-20 05:46:54,342 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 144600 2023-11-20 05:46:54,770 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.34 vs. limit=10.0 2023-11-20 05:47:27,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=964153.3333333334, ans=0.1 2023-11-20 05:47:40,286 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 350, loss[loss=0.07914, simple_loss=0.1032, pruned_loss=0.01757, audio_tagging_loss=0.009946, over 15637.00 frames. ], tot_loss[loss=0.08356, simple_loss=0.1026, pruned_loss=0.02055, audio_tagging_loss=0.01173, over 2522024.28 frames. ], batch size: 58, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:47:43,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=964220.0, ans=0.125 2023-11-20 05:47:46,981 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.88 vs. limit=15.0 2023-11-20 05:47:47,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=964220.0, ans=0.125 2023-11-20 05:47:58,737 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 144650 2023-11-20 05:48:11,830 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.57 vs. limit=15.0 2023-11-20 05:48:27,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=964420.0, ans=0.0 2023-11-20 05:48:41,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.26 vs. limit=15.0 2023-11-20 05:48:44,440 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 400, loss[loss=0.09299, simple_loss=0.1221, pruned_loss=0.02551, audio_tagging_loss=0.00645, over 15395.00 frames. ], tot_loss[loss=0.08395, simple_loss=0.1035, pruned_loss=0.02097, audio_tagging_loss=0.01125, over 2644279.31 frames. ], batch size: 56, lr: 5.42e-03, grad_scale: 32.0 2023-11-20 05:48:51,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=964553.3333333334, ans=0.125 2023-11-20 05:48:52,434 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.722e+01 8.150e+01 8.876e+01 9.638e+01 1.255e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 05:48:52,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=964553.3333333334, ans=0.0 2023-11-20 05:48:55,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=964553.3333333334, ans=0.125 2023-11-20 05:49:04,170 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 144700 2023-11-20 05:49:18,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=964686.6666666666, ans=10.0 2023-11-20 05:49:21,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=964686.6666666666, ans=0.0 2023-11-20 05:49:21,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=964686.6666666666, ans=0.125 2023-11-20 05:49:36,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=964820.0, ans=0.1 2023-11-20 05:49:37,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=964820.0, ans=0.125 2023-11-20 05:49:41,018 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.52 vs. limit=15.0 2023-11-20 05:49:41,214 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.71 vs. limit=6.0 2023-11-20 05:49:49,703 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 450, loss[loss=0.08596, simple_loss=0.1139, pruned_loss=0.02062, audio_tagging_loss=0.008398, over 14865.00 frames. ], tot_loss[loss=0.08318, simple_loss=0.1029, pruned_loss=0.02074, audio_tagging_loss=0.01099, over 2732860.98 frames. ], batch size: 56, lr: 5.42e-03, grad_scale: 32.0 2023-11-20 05:50:02,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=964953.3333333334, ans=0.125 2023-11-20 05:50:03,295 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.65 vs. limit=10.0 2023-11-20 05:50:05,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=964953.3333333334, ans=0.125 2023-11-20 05:50:08,928 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 144750 2023-11-20 05:50:34,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=965086.6666666666, ans=0.0 2023-11-20 05:50:35,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.59 vs. limit=15.0 2023-11-20 05:50:50,086 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.90 vs. limit=22.5 2023-11-20 05:50:54,209 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 500, loss[loss=0.105, simple_loss=0.1331, pruned_loss=0.02768, audio_tagging_loss=0.01078, over 15907.00 frames. ], tot_loss[loss=0.08303, simple_loss=0.1029, pruned_loss=0.02087, audio_tagging_loss=0.01072, over 2807320.05 frames. ], batch size: 60, lr: 5.42e-03, grad_scale: 32.0 2023-11-20 05:50:55,700 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:50:55,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=965220.0, ans=0.125 2023-11-20 05:50:57,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=965220.0, ans=0.0 2023-11-20 05:51:02,238 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.903e+01 8.170e+01 8.806e+01 9.563e+01 1.167e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-20 05:51:06,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=965286.6666666666, ans=0.04949747468305833 2023-11-20 05:51:13,707 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 144800 2023-11-20 05:51:17,002 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.58 vs. limit=22.5 2023-11-20 05:51:20,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=965353.3333333334, ans=0.2 2023-11-20 05:51:20,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=965353.3333333334, ans=0.125 2023-11-20 05:51:21,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=965353.3333333334, ans=0.0 2023-11-20 05:51:28,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=965353.3333333334, ans=0.125 2023-11-20 05:51:36,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=965420.0, ans=0.125 2023-11-20 05:51:38,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=965420.0, ans=0.125 2023-11-20 05:51:40,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.25 vs. limit=15.0 2023-11-20 05:51:44,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=965420.0, ans=0.0 2023-11-20 05:51:47,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=965486.6666666666, ans=0.125 2023-11-20 05:51:58,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=965553.3333333334, ans=0.0 2023-11-20 05:51:59,443 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 550, loss[loss=0.08554, simple_loss=0.106, pruned_loss=0.02259, audio_tagging_loss=0.009948, over 15980.00 frames. ], tot_loss[loss=0.08202, simple_loss=0.1015, pruned_loss=0.02061, audio_tagging_loss=0.01064, over 2857162.14 frames. ], batch size: 62, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:52:19,370 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 144850 2023-11-20 05:52:23,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=965620.0, ans=0.0 2023-11-20 05:52:25,452 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.70 vs. limit=22.5 2023-11-20 05:52:38,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=965753.3333333334, ans=0.0 2023-11-20 05:52:38,684 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.04 vs. limit=15.0 2023-11-20 05:53:01,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=965820.0, ans=0.05 2023-11-20 05:53:04,629 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 600, loss[loss=0.0743, simple_loss=0.09056, pruned_loss=0.01701, audio_tagging_loss=0.012, over 15037.00 frames. ], tot_loss[loss=0.08211, simple_loss=0.1018, pruned_loss=0.02069, audio_tagging_loss=0.0105, over 2898539.48 frames. ], batch size: 58, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:53:12,760 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.448e+01 8.106e+01 8.665e+01 9.302e+01 1.312e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 05:53:14,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=965886.6666666666, ans=0.025 2023-11-20 05:53:15,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=965886.6666666666, ans=0.0 2023-11-20 05:53:18,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=965953.3333333334, ans=0.0 2023-11-20 05:53:23,981 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 144900 2023-11-20 05:53:35,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=966020.0, ans=0.0 2023-11-20 05:53:41,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=966020.0, ans=0.0 2023-11-20 05:53:49,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=966086.6666666666, ans=0.0 2023-11-20 05:53:51,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=966086.6666666666, ans=0.125 2023-11-20 05:54:04,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=966153.3333333334, ans=0.125 2023-11-20 05:54:10,283 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 650, loss[loss=0.08992, simple_loss=0.112, pruned_loss=0.02376, audio_tagging_loss=0.01018, over 15611.00 frames. ], tot_loss[loss=0.08177, simple_loss=0.1015, pruned_loss=0.02058, audio_tagging_loss=0.01045, over 2939526.46 frames. ], batch size: 58, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:54:16,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.04 vs. limit=22.5 2023-11-20 05:54:29,366 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 144950 2023-11-20 05:54:38,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=966353.3333333334, ans=0.5 2023-11-20 05:55:03,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=966486.6666666666, ans=0.125 2023-11-20 05:55:14,364 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 700, loss[loss=0.08024, simple_loss=0.09873, pruned_loss=0.02166, audio_tagging_loss=0.009215, over 16329.00 frames. ], tot_loss[loss=0.08209, simple_loss=0.102, pruned_loss=0.02068, audio_tagging_loss=0.0104, over 2966569.11 frames. ], batch size: 60, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:55:18,963 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:55:22,270 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 8.027e+01 8.645e+01 9.342e+01 1.133e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-20 05:55:22,981 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2023-11-20 05:55:28,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=966620.0, ans=0.0 2023-11-20 05:55:34,897 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 145000 2023-11-20 05:55:40,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=966686.6666666666, ans=0.125 2023-11-20 05:55:42,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=966686.6666666666, ans=0.125 2023-11-20 05:55:43,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=966686.6666666666, ans=0.1 2023-11-20 05:56:00,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=966753.3333333334, ans=0.125 2023-11-20 05:56:02,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=966753.3333333334, ans=0.1 2023-11-20 05:56:09,439 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2023-11-20 05:56:21,164 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 750, loss[loss=0.09806, simple_loss=0.1156, pruned_loss=0.03192, audio_tagging_loss=0.008364, over 15373.00 frames. ], tot_loss[loss=0.08258, simple_loss=0.1027, pruned_loss=0.02088, audio_tagging_loss=0.01033, over 2993095.72 frames. ], batch size: 57, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:56:35,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=966953.3333333334, ans=0.125 2023-11-20 05:56:40,413 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 145050 2023-11-20 05:56:54,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=967020.0, ans=0.07 2023-11-20 05:57:05,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=967086.6666666666, ans=0.1 2023-11-20 05:57:08,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=967086.6666666666, ans=0.1 2023-11-20 05:57:19,523 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.88 vs. limit=15.0 2023-11-20 05:57:23,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=967153.3333333334, ans=0.125 2023-11-20 05:57:25,793 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 800, loss[loss=0.08357, simple_loss=0.1035, pruned_loss=0.01933, audio_tagging_loss=0.0125, over 14668.00 frames. ], tot_loss[loss=0.08246, simple_loss=0.1026, pruned_loss=0.02081, audio_tagging_loss=0.01036, over 3010892.50 frames. ], batch size: 57, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:57:27,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=967220.0, ans=0.125 2023-11-20 05:57:33,182 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.560e+01 8.396e+01 9.088e+01 9.762e+01 1.189e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-20 05:57:44,468 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 145100 2023-11-20 05:57:58,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=967353.3333333334, ans=0.0 2023-11-20 05:58:01,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=967353.3333333334, ans=0.1 2023-11-20 05:58:02,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=967353.3333333334, ans=0.1 2023-11-20 05:58:11,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=967420.0, ans=0.125 2023-11-20 05:58:17,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=967486.6666666666, ans=0.0 2023-11-20 05:58:29,586 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 850, loss[loss=0.07952, simple_loss=0.1042, pruned_loss=0.01973, audio_tagging_loss=0.007666, over 15543.00 frames. ], tot_loss[loss=0.08249, simple_loss=0.1027, pruned_loss=0.02072, audio_tagging_loss=0.01041, over 3025791.82 frames. ], batch size: 57, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:58:33,862 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2023-11-20 05:58:34,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=967553.3333333334, ans=0.0 2023-11-20 05:58:41,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=967620.0, ans=0.0 2023-11-20 05:58:49,659 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 145150 2023-11-20 05:59:11,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=967753.3333333334, ans=0.0 2023-11-20 05:59:27,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=967820.0, ans=0.125 2023-11-20 05:59:27,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=967820.0, ans=0.125 2023-11-20 05:59:34,788 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 900, loss[loss=0.07254, simple_loss=0.08687, pruned_loss=0.01869, audio_tagging_loss=0.01042, over 14698.00 frames. ], tot_loss[loss=0.08284, simple_loss=0.1032, pruned_loss=0.02088, audio_tagging_loss=0.01037, over 3032165.68 frames. ], batch size: 56, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:59:42,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2023-11-20 05:59:42,653 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.961e+01 8.143e+01 8.772e+01 9.521e+01 1.429e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 05:59:44,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=967886.6666666666, ans=0.125 2023-11-20 05:59:54,302 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 145200 2023-11-20 06:00:27,807 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.44 vs. limit=10.0 2023-11-20 06:00:40,228 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 950, loss[loss=0.07677, simple_loss=0.08489, pruned_loss=0.02512, audio_tagging_loss=0.009209, over 14652.00 frames. ], tot_loss[loss=0.08279, simple_loss=0.1032, pruned_loss=0.0209, audio_tagging_loss=0.01028, over 3042149.99 frames. ], batch size: 55, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 06:00:50,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=968220.0, ans=0.0 2023-11-20 06:00:58,571 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 145250 2023-11-20 06:01:06,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=968353.3333333334, ans=0.125 2023-11-20 06:01:06,545 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2023-11-20 06:01:10,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=968353.3333333334, ans=0.125 2023-11-20 06:01:32,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=968486.6666666666, ans=0.0 2023-11-20 06:01:40,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=968486.6666666666, ans=0.0 2023-11-20 06:01:42,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=968553.3333333334, ans=0.125 2023-11-20 06:01:43,368 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 1000, loss[loss=0.07082, simple_loss=0.0912, pruned_loss=0.01409, audio_tagging_loss=0.01113, over 15868.00 frames. ], tot_loss[loss=0.08181, simple_loss=0.1022, pruned_loss=0.02052, audio_tagging_loss=0.01021, over 3041972.43 frames. ], batch size: 59, lr: 5.41e-03, grad_scale: 16.0 2023-11-20 06:01:51,992 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.711e+01 7.989e+01 8.669e+01 9.928e+01 1.196e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 06:02:02,725 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 145300 2023-11-20 06:02:10,068 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:02:17,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=968686.6666666666, ans=0.1 2023-11-20 06:02:34,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=968820.0, ans=0.125 2023-11-20 06:02:35,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=968820.0, ans=0.0 2023-11-20 06:02:43,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=968820.0, ans=0.0 2023-11-20 06:02:47,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=968886.6666666666, ans=0.125 2023-11-20 06:02:48,450 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 1050, loss[loss=0.07691, simple_loss=0.1006, pruned_loss=0.01787, audio_tagging_loss=0.008732, over 15340.00 frames. ], tot_loss[loss=0.08207, simple_loss=0.1026, pruned_loss=0.02077, audio_tagging_loss=0.01002, over 3045513.32 frames. ], batch size: 59, lr: 5.41e-03, grad_scale: 16.0 2023-11-20 06:03:09,041 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 145350 2023-11-20 06:03:11,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=968953.3333333334, ans=0.0 2023-11-20 06:03:12,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=968953.3333333334, ans=0.125 2023-11-20 06:03:12,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=968953.3333333334, ans=0.1 2023-11-20 06:03:42,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=969153.3333333334, ans=0.125 2023-11-20 06:03:54,967 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 1100, loss[loss=0.07721, simple_loss=0.09749, pruned_loss=0.01785, audio_tagging_loss=0.01062, over 14624.00 frames. ], tot_loss[loss=0.08134, simple_loss=0.1016, pruned_loss=0.02054, audio_tagging_loss=0.009978, over 3038212.14 frames. ], batch size: 56, lr: 5.40e-03, grad_scale: 16.0 2023-11-20 06:03:57,481 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:04:03,631 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.580e+01 8.161e+01 8.676e+01 9.552e+01 1.363e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-20 06:04:08,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2023-11-20 06:04:10,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=969286.6666666666, ans=0.1 2023-11-20 06:04:13,715 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 145400 2023-11-20 06:04:18,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=969286.6666666666, ans=0.04949747468305833 2023-11-20 06:04:33,348 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.30 vs. limit=22.5 2023-11-20 06:04:43,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=969420.0, ans=0.125 2023-11-20 06:04:47,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=969486.6666666666, ans=0.5 2023-11-20 06:04:59,729 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 1150, loss[loss=0.1161, simple_loss=0.1512, pruned_loss=0.03648, audio_tagging_loss=0.003999, over 16313.00 frames. ], tot_loss[loss=0.08116, simple_loss=0.1013, pruned_loss=0.02051, audio_tagging_loss=0.009987, over 3044846.66 frames. ], batch size: 56, lr: 5.40e-03, grad_scale: 16.0 2023-11-20 06:05:02,807 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.70 vs. limit=15.0 2023-11-20 06:05:18,966 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 145450 2023-11-20 06:05:37,359 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=15.0 2023-11-20 06:05:49,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=969753.3333333334, ans=0.125 2023-11-20 06:05:50,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=969820.0, ans=0.04949747468305833 2023-11-20 06:06:02,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=969886.6666666666, ans=0.125 2023-11-20 06:06:03,734 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 1200, loss[loss=0.1063, simple_loss=0.1357, pruned_loss=0.02974, audio_tagging_loss=0.008748, over 15212.00 frames. ], tot_loss[loss=0.08069, simple_loss=0.1009, pruned_loss=0.02022, audio_tagging_loss=0.01, over 3046851.34 frames. ], batch size: 53, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:06:08,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=969886.6666666666, ans=0.1 2023-11-20 06:06:13,707 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.636e+01 8.393e+01 8.955e+01 9.874e+01 3.263e+02, threshold=1.791e+02, percent-clipped=1.0 2023-11-20 06:06:14,344 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-11-20 06:06:24,529 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 145500 2023-11-20 06:06:45,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=970086.6666666666, ans=0.125 2023-11-20 06:06:48,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=970086.6666666666, ans=0.1 2023-11-20 06:07:00,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=970153.3333333334, ans=10.0 2023-11-20 06:07:04,499 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.47 vs. limit=15.0 2023-11-20 06:07:06,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=970153.3333333334, ans=0.2 2023-11-20 06:07:09,475 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 1250, loss[loss=0.09345, simple_loss=0.1201, pruned_loss=0.0257, audio_tagging_loss=0.007679, over 15105.00 frames. ], tot_loss[loss=0.08039, simple_loss=0.1005, pruned_loss=0.02019, audio_tagging_loss=0.009944, over 3043984.47 frames. ], batch size: 56, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:07:28,460 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 145550 2023-11-20 06:07:43,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=970353.3333333334, ans=0.125 2023-11-20 06:07:56,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=970420.0, ans=0.95 2023-11-20 06:08:04,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=970486.6666666666, ans=0.125 2023-11-20 06:08:05,958 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.89 vs. limit=15.0 2023-11-20 06:08:13,559 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 1300, loss[loss=0.07262, simple_loss=0.08132, pruned_loss=0.02214, audio_tagging_loss=0.009821, over 14782.00 frames. ], tot_loss[loss=0.08017, simple_loss=0.1001, pruned_loss=0.02013, audio_tagging_loss=0.009965, over 3046051.03 frames. ], batch size: 56, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:08:22,206 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.703e+01 8.020e+01 8.850e+01 9.786e+01 1.232e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-20 06:08:31,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=970620.0, ans=0.0 2023-11-20 06:08:32,746 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 145600 2023-11-20 06:08:35,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=970620.0, ans=0.125 2023-11-20 06:08:49,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=970686.6666666666, ans=0.125 2023-11-20 06:08:57,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=970753.3333333334, ans=0.1 2023-11-20 06:09:05,974 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2023-11-20 06:09:17,777 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 1350, loss[loss=0.0719, simple_loss=0.08633, pruned_loss=0.01768, audio_tagging_loss=0.01106, over 15485.00 frames. ], tot_loss[loss=0.08066, simple_loss=0.1009, pruned_loss=0.02031, audio_tagging_loss=0.009923, over 3047004.52 frames. ], batch size: 58, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:09:37,903 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 145650 2023-11-20 06:09:45,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=971020.0, ans=0.0 2023-11-20 06:10:03,346 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:10:22,941 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 1400, loss[loss=0.06337, simple_loss=0.07028, pruned_loss=0.01394, audio_tagging_loss=0.01429, over 15861.00 frames. ], tot_loss[loss=0.08116, simple_loss=0.1014, pruned_loss=0.02038, audio_tagging_loss=0.01009, over 3054446.08 frames. ], batch size: 60, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:10:26,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=971220.0, ans=0.125 2023-11-20 06:10:32,221 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 8.085e+01 8.864e+01 9.771e+01 1.469e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-20 06:10:42,878 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 145700 2023-11-20 06:10:46,241 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.40 vs. limit=15.0 2023-11-20 06:10:50,929 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2023-11-20 06:11:11,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=971420.0, ans=0.5 2023-11-20 06:11:12,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=971420.0, ans=0.125 2023-11-20 06:11:27,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=971553.3333333334, ans=0.0 2023-11-20 06:11:28,531 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 1450, loss[loss=0.08177, simple_loss=0.1035, pruned_loss=0.02134, audio_tagging_loss=0.008694, over 14437.00 frames. ], tot_loss[loss=0.08189, simple_loss=0.1026, pruned_loss=0.02054, audio_tagging_loss=0.01007, over 3057127.85 frames. ], batch size: 53, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:11:29,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=971553.3333333334, ans=0.1 2023-11-20 06:11:35,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=971553.3333333334, ans=0.125 2023-11-20 06:11:36,389 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.94 vs. limit=15.0 2023-11-20 06:11:46,878 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 145750 2023-11-20 06:12:22,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=971820.0, ans=0.0 2023-11-20 06:12:26,885 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2023-11-20 06:12:31,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=971886.6666666666, ans=0.0 2023-11-20 06:12:32,267 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 1500, loss[loss=0.0714, simple_loss=0.08954, pruned_loss=0.015, audio_tagging_loss=0.01163, over 15877.00 frames. ], tot_loss[loss=0.08199, simple_loss=0.1026, pruned_loss=0.02055, audio_tagging_loss=0.01012, over 3050360.27 frames. ], batch size: 59, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:12:32,889 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.15 vs. limit=15.0 2023-11-20 06:12:41,378 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.185e+01 8.333e+01 8.756e+01 9.459e+01 1.276e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-20 06:12:51,850 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 145800 2023-11-20 06:13:06,004 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:13:24,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=972153.3333333334, ans=0.0 2023-11-20 06:13:37,304 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 1550, loss[loss=0.07269, simple_loss=0.09396, pruned_loss=0.01759, audio_tagging_loss=0.008121, over 13653.00 frames. ], tot_loss[loss=0.08153, simple_loss=0.1019, pruned_loss=0.02035, audio_tagging_loss=0.01022, over 3043197.66 frames. ], batch size: 56, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:13:39,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=972220.0, ans=0.1 2023-11-20 06:13:56,434 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 145850 2023-11-20 06:14:01,232 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.62 vs. limit=15.0 2023-11-20 06:14:03,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=972353.3333333334, ans=0.1 2023-11-20 06:14:17,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=972420.0, ans=0.1 2023-11-20 06:14:19,388 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2023-11-20 06:14:42,112 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 1600, loss[loss=0.08218, simple_loss=0.1085, pruned_loss=0.01935, audio_tagging_loss=0.008605, over 15487.00 frames. ], tot_loss[loss=0.08208, simple_loss=0.1025, pruned_loss=0.02059, audio_tagging_loss=0.01024, over 3039231.95 frames. ], batch size: 56, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:14:51,243 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.806e+01 8.279e+01 8.833e+01 9.772e+01 1.298e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-20 06:15:01,488 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 145900 2023-11-20 06:15:06,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=972686.6666666666, ans=0.0 2023-11-20 06:15:21,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=972753.3333333334, ans=0.125 2023-11-20 06:15:41,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=972820.0, ans=0.125 2023-11-20 06:15:46,800 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 1650, loss[loss=0.07911, simple_loss=0.09099, pruned_loss=0.02146, audio_tagging_loss=0.01216, over 15598.00 frames. ], tot_loss[loss=0.08194, simple_loss=0.1025, pruned_loss=0.02046, audio_tagging_loss=0.01025, over 3041953.29 frames. ], batch size: 60, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:15:50,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=972886.6666666666, ans=0.07 2023-11-20 06:16:06,546 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 145950 2023-11-20 06:16:25,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=973086.6666666666, ans=0.0 2023-11-20 06:16:45,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=973153.3333333334, ans=0.125 2023-11-20 06:16:48,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=973153.3333333334, ans=0.125 2023-11-20 06:16:51,540 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 1700, loss[loss=0.07217, simple_loss=0.0881, pruned_loss=0.01999, audio_tagging_loss=0.008121, over 15895.00 frames. ], tot_loss[loss=0.08168, simple_loss=0.1023, pruned_loss=0.02039, audio_tagging_loss=0.01014, over 3050005.46 frames. ], batch size: 61, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:17:02,514 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.052e+01 8.811e+01 9.587e+01 1.278e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-20 06:17:11,371 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 146000 2023-11-20 06:17:40,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=973420.0, ans=0.125 2023-11-20 06:17:45,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=973486.6666666666, ans=0.04949747468305833 2023-11-20 06:17:56,475 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 1750, loss[loss=0.09384, simple_loss=0.1184, pruned_loss=0.02527, audio_tagging_loss=0.009346, over 14847.00 frames. ], tot_loss[loss=0.08156, simple_loss=0.1023, pruned_loss=0.02036, audio_tagging_loss=0.01003, over 3050483.71 frames. ], batch size: 54, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:17:57,954 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:18:00,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=973553.3333333334, ans=0.125 2023-11-20 06:18:15,681 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 146050 2023-11-20 06:18:42,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=973753.3333333334, ans=0.0 2023-11-20 06:18:45,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=973753.3333333334, ans=0.015 2023-11-20 06:19:00,628 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 1800, loss[loss=0.08848, simple_loss=0.1146, pruned_loss=0.02281, audio_tagging_loss=0.008361, over 15054.00 frames. ], tot_loss[loss=0.08122, simple_loss=0.1021, pruned_loss=0.02021, audio_tagging_loss=0.009973, over 3046318.47 frames. ], batch size: 56, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:19:11,432 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.956e+01 8.340e+01 8.928e+01 9.770e+01 2.074e+02, threshold=1.786e+02, percent-clipped=1.0 2023-11-20 06:19:21,333 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 146100 2023-11-20 06:19:30,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=974020.0, ans=0.0 2023-11-20 06:19:39,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=974086.6666666666, ans=10.0 2023-11-20 06:19:52,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=974153.3333333334, ans=0.1 2023-11-20 06:20:01,894 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:20:06,121 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 1850, loss[loss=0.09861, simple_loss=0.1161, pruned_loss=0.03005, audio_tagging_loss=0.01052, over 15044.00 frames. ], tot_loss[loss=0.08114, simple_loss=0.1019, pruned_loss=0.02022, audio_tagging_loss=0.009965, over 3044305.94 frames. ], batch size: 56, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:20:21,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=974286.6666666666, ans=0.125 2023-11-20 06:20:23,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=974286.6666666666, ans=0.125 2023-11-20 06:20:25,770 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 146150 2023-11-20 06:20:34,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=974353.3333333334, ans=0.0 2023-11-20 06:20:42,577 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.21 vs. limit=22.5 2023-11-20 06:20:51,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=974420.0, ans=0.125 2023-11-20 06:21:07,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.91 vs. limit=10.0 2023-11-20 06:21:10,952 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.36 vs. limit=15.0 2023-11-20 06:21:11,575 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 1900, loss[loss=0.08005, simple_loss=0.0951, pruned_loss=0.02239, audio_tagging_loss=0.01011, over 15124.00 frames. ], tot_loss[loss=0.0805, simple_loss=0.101, pruned_loss=0.02009, audio_tagging_loss=0.009883, over 3044167.94 frames. ], batch size: 57, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:21:19,804 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.53 vs. limit=12.0 2023-11-20 06:21:21,308 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.109e+01 8.191e+01 8.714e+01 9.670e+01 1.123e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-20 06:21:21,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=974553.3333333334, ans=0.125 2023-11-20 06:21:30,140 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 146200 2023-11-20 06:21:58,398 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=15.0 2023-11-20 06:22:00,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=974753.3333333334, ans=0.0 2023-11-20 06:22:05,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=974820.0, ans=0.0 2023-11-20 06:22:10,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=974820.0, ans=0.0 2023-11-20 06:22:16,098 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 1950, loss[loss=0.0681, simple_loss=0.08513, pruned_loss=0.01434, audio_tagging_loss=0.0112, over 14923.00 frames. ], tot_loss[loss=0.08013, simple_loss=0.1005, pruned_loss=0.02002, audio_tagging_loss=0.009851, over 3044411.49 frames. ], batch size: 58, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:22:22,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=974886.6666666666, ans=0.0 2023-11-20 06:22:24,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=974886.6666666666, ans=0.125 2023-11-20 06:22:35,940 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 146250 2023-11-20 06:22:51,286 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.63 vs. limit=10.0 2023-11-20 06:22:52,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=975020.0, ans=0.125 2023-11-20 06:23:09,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=975153.3333333334, ans=0.125 2023-11-20 06:23:12,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=975153.3333333334, ans=0.0 2023-11-20 06:23:21,424 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 2000, loss[loss=0.09867, simple_loss=0.1208, pruned_loss=0.02547, audio_tagging_loss=0.01277, over 14007.00 frames. ], tot_loss[loss=0.0796, simple_loss=0.09964, pruned_loss=0.0198, audio_tagging_loss=0.009987, over 3040395.37 frames. ], batch size: 54, lr: 5.39e-03, grad_scale: 32.0 2023-11-20 06:23:22,847 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:23:30,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=975220.0, ans=0.1 2023-11-20 06:23:31,798 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 7.759e+01 8.483e+01 9.476e+01 1.626e+02, threshold=1.697e+02, percent-clipped=0.0 2023-11-20 06:23:34,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=975286.6666666666, ans=0.125 2023-11-20 06:23:41,217 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 146300 2023-11-20 06:23:43,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=975286.6666666666, ans=0.0 2023-11-20 06:24:20,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.46 vs. limit=15.0 2023-11-20 06:24:26,443 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 2050, loss[loss=0.08016, simple_loss=0.09708, pruned_loss=0.01849, audio_tagging_loss=0.01313, over 15451.00 frames. ], tot_loss[loss=0.08085, simple_loss=0.1015, pruned_loss=0.0203, audio_tagging_loss=0.009829, over 3044042.82 frames. ], batch size: 58, lr: 5.39e-03, grad_scale: 32.0 2023-11-20 06:24:34,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=975553.3333333334, ans=0.2 2023-11-20 06:24:37,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=975620.0, ans=0.125 2023-11-20 06:24:42,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=975620.0, ans=0.125 2023-11-20 06:24:45,180 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 146350 2023-11-20 06:24:52,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=975686.6666666666, ans=0.125 2023-11-20 06:24:57,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=975686.6666666666, ans=0.125 2023-11-20 06:25:10,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=975753.3333333334, ans=0.0 2023-11-20 06:25:30,030 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 2100, loss[loss=0.06146, simple_loss=0.07359, pruned_loss=0.01422, audio_tagging_loss=0.01045, over 14590.00 frames. ], tot_loss[loss=0.08123, simple_loss=0.102, pruned_loss=0.02042, audio_tagging_loss=0.009813, over 3042291.42 frames. ], batch size: 56, lr: 5.39e-03, grad_scale: 32.0 2023-11-20 06:25:39,690 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.798e+01 8.216e+01 8.927e+01 9.679e+01 1.244e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-20 06:25:48,918 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 146400 2023-11-20 06:26:04,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=976020.0, ans=0.0 2023-11-20 06:26:12,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=976086.6666666666, ans=0.1 2023-11-20 06:26:23,621 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.96 vs. limit=15.0 2023-11-20 06:26:24,766 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.36 vs. limit=10.0 2023-11-20 06:26:28,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=976153.3333333334, ans=0.125 2023-11-20 06:26:31,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=976153.3333333334, ans=0.5 2023-11-20 06:26:33,953 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 2150, loss[loss=0.07489, simple_loss=0.09637, pruned_loss=0.01767, audio_tagging_loss=0.009044, over 14238.00 frames. ], tot_loss[loss=0.08125, simple_loss=0.1018, pruned_loss=0.02042, audio_tagging_loss=0.009911, over 3041804.73 frames. ], batch size: 54, lr: 5.39e-03, grad_scale: 32.0 2023-11-20 06:26:44,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=976220.0, ans=0.125 2023-11-20 06:26:55,017 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 146450 2023-11-20 06:27:12,186 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:27:13,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=976420.0, ans=0.125 2023-11-20 06:27:18,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=976420.0, ans=0.0 2023-11-20 06:27:24,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=976486.6666666666, ans=0.0 2023-11-20 06:27:39,838 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 2200, loss[loss=0.07346, simple_loss=0.09149, pruned_loss=0.01663, audio_tagging_loss=0.01108, over 15575.00 frames. ], tot_loss[loss=0.08112, simple_loss=0.1017, pruned_loss=0.02037, audio_tagging_loss=0.009897, over 3050050.90 frames. ], batch size: 58, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:27:40,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=976553.3333333334, ans=0.2 2023-11-20 06:27:45,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=976553.3333333334, ans=0.0 2023-11-20 06:27:46,096 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=12.0 2023-11-20 06:27:50,519 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.751e+01 8.222e+01 8.987e+01 9.495e+01 1.215e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-20 06:27:58,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=22.5 2023-11-20 06:27:59,350 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 146500 2023-11-20 06:28:32,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=976820.0, ans=0.125 2023-11-20 06:28:36,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=976820.0, ans=0.0 2023-11-20 06:28:39,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=976820.0, ans=0.0 2023-11-20 06:28:44,329 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 2250, loss[loss=0.07038, simple_loss=0.06864, pruned_loss=0.02169, audio_tagging_loss=0.01436, over 14454.00 frames. ], tot_loss[loss=0.08128, simple_loss=0.1018, pruned_loss=0.02041, audio_tagging_loss=0.009964, over 3051041.87 frames. ], batch size: 55, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:28:45,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=976886.6666666666, ans=0.2 2023-11-20 06:29:00,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=976953.3333333334, ans=0.0 2023-11-20 06:29:03,344 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 146550 2023-11-20 06:29:16,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.85 vs. limit=15.0 2023-11-20 06:29:21,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=977020.0, ans=0.0 2023-11-20 06:29:31,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=977086.6666666666, ans=0.125 2023-11-20 06:29:32,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=977086.6666666666, ans=0.1 2023-11-20 06:29:41,371 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.27 vs. limit=22.5 2023-11-20 06:29:48,056 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 2300, loss[loss=0.08977, simple_loss=0.1153, pruned_loss=0.02346, audio_tagging_loss=0.008668, over 15236.00 frames. ], tot_loss[loss=0.08176, simple_loss=0.1024, pruned_loss=0.02062, audio_tagging_loss=0.009944, over 3048553.59 frames. ], batch size: 57, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:29:58,636 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.545e+01 8.135e+01 8.852e+01 9.695e+01 1.259e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-20 06:30:08,677 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 146600 2023-11-20 06:30:10,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=977286.6666666666, ans=0.125 2023-11-20 06:30:15,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=977353.3333333334, ans=12.0 2023-11-20 06:30:20,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=977353.3333333334, ans=0.125 2023-11-20 06:30:34,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.17 vs. limit=22.5 2023-11-20 06:30:40,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=977486.6666666666, ans=0.125 2023-11-20 06:30:44,329 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:30:44,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=977486.6666666666, ans=0.1 2023-11-20 06:30:48,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=977486.6666666666, ans=0.125 2023-11-20 06:30:53,516 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 2350, loss[loss=0.08253, simple_loss=0.107, pruned_loss=0.0205, audio_tagging_loss=0.008552, over 15489.00 frames. ], tot_loss[loss=0.08105, simple_loss=0.1012, pruned_loss=0.02034, audio_tagging_loss=0.01014, over 3045217.84 frames. ], batch size: 57, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:31:04,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=977553.3333333334, ans=0.0 2023-11-20 06:31:13,196 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 146650 2023-11-20 06:31:14,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=977620.0, ans=0.125 2023-11-20 06:31:16,296 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2023-11-20 06:31:46,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=977820.0, ans=0.0 2023-11-20 06:31:52,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=977820.0, ans=10.0 2023-11-20 06:31:58,151 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 2400, loss[loss=0.08395, simple_loss=0.1067, pruned_loss=0.02179, audio_tagging_loss=0.008812, over 14857.00 frames. ], tot_loss[loss=0.08059, simple_loss=0.1005, pruned_loss=0.02008, audio_tagging_loss=0.01026, over 3040114.65 frames. ], batch size: 55, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:32:09,020 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.339e+01 8.246e+01 8.786e+01 9.716e+01 1.266e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-20 06:32:16,312 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 146700 2023-11-20 06:32:33,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=978020.0, ans=0.0 2023-11-20 06:32:40,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=978086.6666666666, ans=0.125 2023-11-20 06:33:01,149 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.16 vs. limit=15.0 2023-11-20 06:33:01,536 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 2450, loss[loss=0.1159, simple_loss=0.1409, pruned_loss=0.03574, audio_tagging_loss=0.00971, over 16061.00 frames. ], tot_loss[loss=0.08069, simple_loss=0.1004, pruned_loss=0.02009, audio_tagging_loss=0.01039, over 3048951.27 frames. ], batch size: 57, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:33:21,631 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 146750 2023-11-20 06:33:27,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=978353.3333333334, ans=0.125 2023-11-20 06:33:38,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=978353.3333333334, ans=0.125 2023-11-20 06:33:55,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=978486.6666666666, ans=0.1 2023-11-20 06:33:57,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.26 vs. limit=15.0 2023-11-20 06:34:02,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=978486.6666666666, ans=0.0 2023-11-20 06:34:06,439 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 2500, loss[loss=0.08388, simple_loss=0.1089, pruned_loss=0.01866, audio_tagging_loss=0.01076, over 15690.00 frames. ], tot_loss[loss=0.08008, simple_loss=0.09994, pruned_loss=0.01982, audio_tagging_loss=0.0103, over 3040927.07 frames. ], batch size: 60, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:34:18,575 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.294e+01 8.313e+01 9.053e+01 9.691e+01 1.783e+02, threshold=1.811e+02, percent-clipped=1.0 2023-11-20 06:34:26,058 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 146800 2023-11-20 06:34:29,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=978620.0, ans=0.1 2023-11-20 06:35:04,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=978820.0, ans=0.0 2023-11-20 06:35:11,803 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 2550, loss[loss=0.09781, simple_loss=0.1282, pruned_loss=0.02497, audio_tagging_loss=0.008718, over 15686.00 frames. ], tot_loss[loss=0.08034, simple_loss=0.1003, pruned_loss=0.01997, audio_tagging_loss=0.01023, over 3041501.43 frames. ], batch size: 56, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:35:12,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=978886.6666666666, ans=0.07 2023-11-20 06:35:19,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=978886.6666666666, ans=0.0 2023-11-20 06:35:29,932 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 146850 2023-11-20 06:35:35,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=979020.0, ans=0.1 2023-11-20 06:35:37,003 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2023-11-20 06:35:49,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=979086.6666666666, ans=0.2 2023-11-20 06:35:52,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=979086.6666666666, ans=0.1 2023-11-20 06:36:00,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=979086.6666666666, ans=0.2 2023-11-20 06:36:01,085 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.60 vs. limit=15.0 2023-11-20 06:36:15,071 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 2600, loss[loss=0.07155, simple_loss=0.09235, pruned_loss=0.0155, audio_tagging_loss=0.009875, over 15762.00 frames. ], tot_loss[loss=0.07981, simple_loss=0.09946, pruned_loss=0.01992, audio_tagging_loss=0.01016, over 3043514.41 frames. ], batch size: 58, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:36:24,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=979220.0, ans=0.2 2023-11-20 06:36:26,628 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.215e+01 8.794e+01 9.573e+01 1.201e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 06:36:34,893 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 146900 2023-11-20 06:36:35,559 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-20 06:37:03,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=979420.0, ans=0.125 2023-11-20 06:37:12,375 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.15 vs. limit=22.5 2023-11-20 06:37:20,166 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 2650, loss[loss=0.08141, simple_loss=0.1112, pruned_loss=0.01718, audio_tagging_loss=0.008654, over 15549.00 frames. ], tot_loss[loss=0.07988, simple_loss=0.09975, pruned_loss=0.01993, audio_tagging_loss=0.01008, over 3044740.72 frames. ], batch size: 58, lr: 5.38e-03, grad_scale: 16.0 2023-11-20 06:37:25,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=979553.3333333334, ans=0.125 2023-11-20 06:37:39,798 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 146950 2023-11-20 06:37:41,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=979620.0, ans=0.0 2023-11-20 06:37:45,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=979686.6666666666, ans=0.0 2023-11-20 06:38:15,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=979820.0, ans=0.0 2023-11-20 06:38:23,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=979886.6666666666, ans=0.1 2023-11-20 06:38:24,559 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 2700, loss[loss=0.05862, simple_loss=0.06974, pruned_loss=0.01388, audio_tagging_loss=0.009869, over 15300.00 frames. ], tot_loss[loss=0.07909, simple_loss=0.09887, pruned_loss=0.01967, audio_tagging_loss=0.009984, over 3044214.18 frames. ], batch size: 59, lr: 5.38e-03, grad_scale: 16.0 2023-11-20 06:38:30,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=979886.6666666666, ans=0.0 2023-11-20 06:38:37,513 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.654e+01 8.126e+01 8.706e+01 9.526e+01 1.459e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 06:38:43,066 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.00 vs. limit=15.0 2023-11-20 06:38:43,839 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 147000 2023-11-20 06:38:48,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=979953.3333333334, ans=0.125 2023-11-20 06:38:49,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=980020.0, ans=0.09899494936611666 2023-11-20 06:39:05,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=980086.6666666666, ans=0.125 2023-11-20 06:39:20,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=980153.3333333334, ans=0.125 2023-11-20 06:39:22,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=980153.3333333334, ans=0.09899494936611666 2023-11-20 06:39:29,315 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 2750, loss[loss=0.08451, simple_loss=0.1043, pruned_loss=0.01993, audio_tagging_loss=0.01241, over 14969.00 frames. ], tot_loss[loss=0.07971, simple_loss=0.09962, pruned_loss=0.01988, audio_tagging_loss=0.01002, over 3044321.40 frames. ], batch size: 56, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:39:29,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=980220.0, ans=0.2 2023-11-20 06:39:37,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=980220.0, ans=0.0 2023-11-20 06:39:49,308 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 147050 2023-11-20 06:39:54,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=980353.3333333334, ans=0.125 2023-11-20 06:40:05,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=980353.3333333334, ans=0.0 2023-11-20 06:40:09,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=980420.0, ans=0.0 2023-11-20 06:40:23,482 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:40:23,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=980486.6666666666, ans=0.0 2023-11-20 06:40:24,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=980486.6666666666, ans=0.2 2023-11-20 06:40:33,954 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 2800, loss[loss=0.09202, simple_loss=0.1234, pruned_loss=0.02117, audio_tagging_loss=0.00917, over 14643.00 frames. ], tot_loss[loss=0.07957, simple_loss=0.09943, pruned_loss=0.01985, audio_tagging_loss=0.01001, over 3045360.61 frames. ], batch size: 53, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:40:48,690 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.373e+01 8.295e+01 8.926e+01 1.019e+02 1.823e+02, threshold=1.785e+02, percent-clipped=1.0 2023-11-20 06:40:53,884 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 147100 2023-11-20 06:40:57,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=980620.0, ans=0.125 2023-11-20 06:41:14,924 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.93 vs. limit=22.5 2023-11-20 06:41:19,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=980753.3333333334, ans=0.125 2023-11-20 06:41:19,815 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:41:39,235 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 2850, loss[loss=0.06631, simple_loss=0.07421, pruned_loss=0.01794, audio_tagging_loss=0.01127, over 15823.00 frames. ], tot_loss[loss=0.07998, simple_loss=0.1001, pruned_loss=0.01997, audio_tagging_loss=0.009951, over 3055690.96 frames. ], batch size: 60, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:41:39,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=980886.6666666666, ans=0.1 2023-11-20 06:41:56,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=980953.3333333334, ans=0.0 2023-11-20 06:41:58,406 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 147150 2023-11-20 06:42:22,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=981086.6666666666, ans=0.125 2023-11-20 06:42:23,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=981086.6666666666, ans=0.125 2023-11-20 06:42:32,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=981153.3333333334, ans=0.125 2023-11-20 06:42:33,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=981153.3333333334, ans=0.125 2023-11-20 06:42:38,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=981153.3333333334, ans=0.125 2023-11-20 06:42:42,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=981220.0, ans=0.0 2023-11-20 06:42:43,713 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 2900, loss[loss=0.07347, simple_loss=0.08988, pruned_loss=0.01667, audio_tagging_loss=0.01186, over 15500.00 frames. ], tot_loss[loss=0.08014, simple_loss=0.1005, pruned_loss=0.02004, audio_tagging_loss=0.009862, over 3054950.77 frames. ], batch size: 62, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:42:48,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=981220.0, ans=0.125 2023-11-20 06:42:57,849 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.790e+01 8.308e+01 8.941e+01 9.842e+01 1.369e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 06:43:02,750 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 147200 2023-11-20 06:43:11,238 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.97 vs. limit=15.0 2023-11-20 06:43:27,969 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-20 06:43:42,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=981486.6666666666, ans=0.015 2023-11-20 06:43:48,100 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 2950, loss[loss=0.09865, simple_loss=0.1237, pruned_loss=0.0271, audio_tagging_loss=0.009692, over 15495.00 frames. ], tot_loss[loss=0.08064, simple_loss=0.1012, pruned_loss=0.02014, audio_tagging_loss=0.009916, over 3054383.28 frames. ], batch size: 55, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:43:59,087 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.69 vs. limit=15.0 2023-11-20 06:44:07,694 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 147250 2023-11-20 06:44:10,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=981620.0, ans=10.0 2023-11-20 06:44:30,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=981753.3333333334, ans=0.125 2023-11-20 06:44:41,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=981820.0, ans=0.125 2023-11-20 06:44:44,773 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.09 vs. limit=8.0 2023-11-20 06:44:46,885 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.17 vs. limit=15.0 2023-11-20 06:44:53,070 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 3000, loss[loss=0.09388, simple_loss=0.1163, pruned_loss=0.02691, audio_tagging_loss=0.008803, over 15259.00 frames. ], tot_loss[loss=0.08113, simple_loss=0.1016, pruned_loss=0.02041, audio_tagging_loss=0.009927, over 3055140.97 frames. ], batch size: 57, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:44:53,071 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-20 06:45:23,120 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.0082, 3.2373, 2.8228, 2.9256, 3.5582, 3.6398, 3.0867, 3.5977], device='cuda:3') 2023-11-20 06:45:31,937 INFO [train_asr.py:1294] (3/4) Epoch 13, validation: loss=0.06242, simple_loss=0.05394, pruned_loss=0.005804, audio_tagging_loss=0.02964, over 4681554.00 frames. 2023-11-20 06:45:31,938 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-20 06:45:37,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=981886.6666666666, ans=0.0 2023-11-20 06:45:46,500 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.49 vs. limit=15.0 2023-11-20 06:45:46,820 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.395e+01 8.201e+01 8.897e+01 9.903e+01 1.229e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 06:45:52,567 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 147300 2023-11-20 06:46:02,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=982020.0, ans=0.1 2023-11-20 06:46:21,897 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.110e-01 2023-11-20 06:46:30,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=982153.3333333334, ans=0.125 2023-11-20 06:46:37,636 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 3050, loss[loss=0.08541, simple_loss=0.1122, pruned_loss=0.01882, audio_tagging_loss=0.01049, over 15280.00 frames. ], tot_loss[loss=0.08207, simple_loss=0.1028, pruned_loss=0.0206, audio_tagging_loss=0.01009, over 3051207.37 frames. ], batch size: 56, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:46:42,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=982220.0, ans=0.1 2023-11-20 06:46:44,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=982220.0, ans=0.125 2023-11-20 06:46:57,050 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 147350 2023-11-20 06:46:58,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=982286.6666666666, ans=0.0 2023-11-20 06:46:58,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=982286.6666666666, ans=0.0 2023-11-20 06:47:04,897 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.09 vs. limit=15.0 2023-11-20 06:47:10,083 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.30 vs. limit=15.0 2023-11-20 06:47:13,008 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:47:38,980 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.70 vs. limit=15.0 2023-11-20 06:47:42,631 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 3100, loss[loss=0.07629, simple_loss=0.09321, pruned_loss=0.01586, audio_tagging_loss=0.01383, over 15675.00 frames. ], tot_loss[loss=0.08231, simple_loss=0.1033, pruned_loss=0.02064, audio_tagging_loss=0.01001, over 3054814.43 frames. ], batch size: 59, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:47:55,954 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.032e+01 8.672e+01 9.635e+01 1.172e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 06:48:00,933 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 147400 2023-11-20 06:48:04,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=982620.0, ans=0.1 2023-11-20 06:48:31,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=982753.3333333334, ans=0.125 2023-11-20 06:48:35,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=982820.0, ans=0.125 2023-11-20 06:48:41,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=982820.0, ans=0.07 2023-11-20 06:48:47,323 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 3150, loss[loss=0.0949, simple_loss=0.1315, pruned_loss=0.02122, audio_tagging_loss=0.007938, over 15810.00 frames. ], tot_loss[loss=0.08187, simple_loss=0.1028, pruned_loss=0.02035, audio_tagging_loss=0.01012, over 3059084.34 frames. ], batch size: 56, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:48:47,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=982886.6666666666, ans=0.125 2023-11-20 06:48:53,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=982886.6666666666, ans=0.2 2023-11-20 06:49:06,546 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 147450 2023-11-20 06:49:09,168 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=15.0 2023-11-20 06:49:32,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=983086.6666666666, ans=0.1 2023-11-20 06:49:52,031 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 3200, loss[loss=0.09015, simple_loss=0.1156, pruned_loss=0.02244, audio_tagging_loss=0.009918, over 14905.00 frames. ], tot_loss[loss=0.08194, simple_loss=0.103, pruned_loss=0.02028, audio_tagging_loss=0.01016, over 3058264.65 frames. ], batch size: 57, lr: 5.37e-03, grad_scale: 32.0 2023-11-20 06:50:06,128 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.614e+01 8.160e+01 8.999e+01 9.638e+01 2.555e+02, threshold=1.800e+02, percent-clipped=1.0 2023-11-20 06:50:11,816 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 147500 2023-11-20 06:50:15,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=983286.6666666666, ans=0.0 2023-11-20 06:50:32,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=983420.0, ans=0.0 2023-11-20 06:50:56,508 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 3250, loss[loss=0.09659, simple_loss=0.1213, pruned_loss=0.02457, audio_tagging_loss=0.01137, over 14799.00 frames. ], tot_loss[loss=0.08187, simple_loss=0.1026, pruned_loss=0.02016, audio_tagging_loss=0.0104, over 3062582.15 frames. ], batch size: 54, lr: 5.37e-03, grad_scale: 32.0 2023-11-20 06:51:10,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=983620.0, ans=0.2 2023-11-20 06:51:15,520 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 147550 2023-11-20 06:51:17,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=983620.0, ans=0.125 2023-11-20 06:51:18,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=983620.0, ans=0.125 2023-11-20 06:51:30,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=983686.6666666666, ans=0.125 2023-11-20 06:52:00,913 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 3300, loss[loss=0.08473, simple_loss=0.1121, pruned_loss=0.01822, audio_tagging_loss=0.01048, over 16102.00 frames. ], tot_loss[loss=0.08271, simple_loss=0.1036, pruned_loss=0.02064, audio_tagging_loss=0.01025, over 3062520.73 frames. ], batch size: 58, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:52:11,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=983886.6666666666, ans=0.1 2023-11-20 06:52:13,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=983953.3333333334, ans=0.1 2023-11-20 06:52:13,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=983953.3333333334, ans=0.125 2023-11-20 06:52:14,561 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.228e+01 8.367e+01 9.193e+01 1.015e+02 1.401e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-20 06:52:20,249 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 147600 2023-11-20 06:52:25,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=983953.3333333334, ans=0.0 2023-11-20 06:53:05,457 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 3350, loss[loss=0.06171, simple_loss=0.07919, pruned_loss=0.01359, audio_tagging_loss=0.008529, over 14544.00 frames. ], tot_loss[loss=0.08283, simple_loss=0.1038, pruned_loss=0.02074, audio_tagging_loss=0.01018, over 3059882.86 frames. ], batch size: 55, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:53:26,287 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 147650 2023-11-20 06:53:26,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=984286.6666666666, ans=0.0 2023-11-20 06:54:11,607 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 3400, loss[loss=0.08265, simple_loss=0.1052, pruned_loss=0.02129, audio_tagging_loss=0.008779, over 14393.00 frames. ], tot_loss[loss=0.08314, simple_loss=0.1044, pruned_loss=0.02087, audio_tagging_loss=0.01006, over 3054316.57 frames. ], batch size: 53, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:54:25,748 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.861e+01 8.407e+01 9.126e+01 1.002e+02 1.896e+02, threshold=1.825e+02, percent-clipped=1.0 2023-11-20 06:54:30,993 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 147700 2023-11-20 06:54:40,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=984686.6666666666, ans=0.02 2023-11-20 06:54:40,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=984686.6666666666, ans=0.125 2023-11-20 06:54:40,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=984686.6666666666, ans=0.125 2023-11-20 06:54:44,064 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.72 vs. limit=22.5 2023-11-20 06:54:46,874 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.76 vs. limit=15.0 2023-11-20 06:54:49,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=984753.3333333334, ans=0.1 2023-11-20 06:54:58,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=984753.3333333334, ans=0.1 2023-11-20 06:55:02,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=984820.0, ans=0.125 2023-11-20 06:55:16,232 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 3450, loss[loss=0.06583, simple_loss=0.08061, pruned_loss=0.01792, audio_tagging_loss=0.007602, over 15540.00 frames. ], tot_loss[loss=0.08229, simple_loss=0.1034, pruned_loss=0.02064, audio_tagging_loss=0.009942, over 3050388.86 frames. ], batch size: 59, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:55:20,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=984886.6666666666, ans=0.125 2023-11-20 06:55:35,439 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 147750 2023-11-20 06:55:36,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=984953.3333333334, ans=0.125 2023-11-20 06:55:51,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=985020.0, ans=0.125 2023-11-20 06:55:55,034 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2023-11-20 06:56:11,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=985153.3333333334, ans=0.125 2023-11-20 06:56:19,936 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 3500, loss[loss=0.0777, simple_loss=0.09137, pruned_loss=0.02149, audio_tagging_loss=0.01051, over 14360.00 frames. ], tot_loss[loss=0.08211, simple_loss=0.1031, pruned_loss=0.02068, audio_tagging_loss=0.009903, over 3053468.16 frames. ], batch size: 54, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:56:28,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=985220.0, ans=0.125 2023-11-20 06:56:31,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=985220.0, ans=0.0 2023-11-20 06:56:32,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=985286.6666666666, ans=15.0 2023-11-20 06:56:34,556 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.197e+01 8.889e+01 1.015e+02 2.808e+02, threshold=1.778e+02, percent-clipped=1.0 2023-11-20 06:56:40,040 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 147800 2023-11-20 06:56:52,857 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:57:06,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=985420.0, ans=0.125 2023-11-20 06:57:24,524 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 3550, loss[loss=0.106, simple_loss=0.1288, pruned_loss=0.03305, audio_tagging_loss=0.00853, over 14540.00 frames. ], tot_loss[loss=0.08167, simple_loss=0.1022, pruned_loss=0.02064, audio_tagging_loss=0.009936, over 3045143.46 frames. ], batch size: 56, lr: 5.36e-03, grad_scale: 16.0 2023-11-20 06:57:36,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=985620.0, ans=0.0 2023-11-20 06:57:39,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=985620.0, ans=0.125 2023-11-20 06:57:44,169 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 147850 2023-11-20 06:58:15,754 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.95 vs. limit=15.0 2023-11-20 06:58:29,821 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 3600, loss[loss=0.08412, simple_loss=0.1044, pruned_loss=0.0207, audio_tagging_loss=0.01122, over 15703.00 frames. ], tot_loss[loss=0.08053, simple_loss=0.1008, pruned_loss=0.02011, audio_tagging_loss=0.01003, over 3043565.96 frames. ], batch size: 59, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:58:44,592 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.078e+01 8.812e+01 9.931e+01 2.962e+02, threshold=1.762e+02, percent-clipped=1.0 2023-11-20 06:58:48,345 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 147900 2023-11-20 06:59:00,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=986020.0, ans=0.125 2023-11-20 06:59:02,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=986020.0, ans=0.0 2023-11-20 06:59:29,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=986153.3333333334, ans=0.125 2023-11-20 06:59:33,132 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 3650, loss[loss=0.08792, simple_loss=0.1088, pruned_loss=0.0212, audio_tagging_loss=0.01233, over 15067.00 frames. ], tot_loss[loss=0.08032, simple_loss=0.1004, pruned_loss=0.02009, audio_tagging_loss=0.01002, over 3043689.42 frames. ], batch size: 56, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:59:41,746 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=22.5 2023-11-20 06:59:49,602 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:59:52,928 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 147950 2023-11-20 07:00:03,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=986353.3333333334, ans=0.0 2023-11-20 07:00:07,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=986353.3333333334, ans=0.0 2023-11-20 07:00:24,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=986486.6666666666, ans=0.2 2023-11-20 07:00:34,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=986486.6666666666, ans=0.0 2023-11-20 07:00:35,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2023-11-20 07:00:38,023 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 3700, loss[loss=0.07824, simple_loss=0.0871, pruned_loss=0.02253, audio_tagging_loss=0.01216, over 14414.00 frames. ], tot_loss[loss=0.08021, simple_loss=0.1001, pruned_loss=0.02014, audio_tagging_loss=0.01001, over 3046501.82 frames. ], batch size: 56, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 07:00:46,589 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.68 vs. limit=6.0 2023-11-20 07:00:53,829 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.585e+01 7.922e+01 8.709e+01 9.261e+01 1.390e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 07:00:57,664 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 148000 2023-11-20 07:01:27,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=986753.3333333334, ans=0.1 2023-11-20 07:01:47,065 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 3750, loss[loss=0.1105, simple_loss=0.152, pruned_loss=0.02704, audio_tagging_loss=0.007471, over 17160.00 frames. ], tot_loss[loss=0.08086, simple_loss=0.1012, pruned_loss=0.02036, audio_tagging_loss=0.009893, over 3044871.10 frames. ], batch size: 62, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 07:01:47,339 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:01:53,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=986886.6666666666, ans=0.125 2023-11-20 07:01:59,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=986953.3333333334, ans=0.0 2023-11-20 07:02:05,701 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 148050 2023-11-20 07:02:23,694 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.78 vs. limit=22.5 2023-11-20 07:02:25,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=987086.6666666666, ans=0.125 2023-11-20 07:02:30,759 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:02:30,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=987086.6666666666, ans=0.0 2023-11-20 07:02:33,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.02 vs. limit=15.0 2023-11-20 07:02:35,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=987086.6666666666, ans=0.125 2023-11-20 07:02:37,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=987153.3333333334, ans=0.125 2023-11-20 07:02:42,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=987153.3333333334, ans=0.2 2023-11-20 07:02:51,339 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 3800, loss[loss=0.06401, simple_loss=0.08226, pruned_loss=0.01329, audio_tagging_loss=0.009593, over 15368.00 frames. ], tot_loss[loss=0.08116, simple_loss=0.1016, pruned_loss=0.02042, audio_tagging_loss=0.009931, over 3042167.96 frames. ], batch size: 57, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 07:03:03,410 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2023-11-20 07:03:07,202 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.518e+01 8.071e+01 8.832e+01 9.688e+01 1.208e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-20 07:03:10,947 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 148100 2023-11-20 07:03:21,407 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:03:25,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=987353.3333333334, ans=0.125 2023-11-20 07:03:41,926 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.84 vs. limit=10.0 2023-11-20 07:03:55,657 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 3850, loss[loss=0.06906, simple_loss=0.08636, pruned_loss=0.01683, audio_tagging_loss=0.009044, over 13771.00 frames. ], tot_loss[loss=0.08136, simple_loss=0.102, pruned_loss=0.0204, audio_tagging_loss=0.009973, over 3046785.50 frames. ], batch size: 56, lr: 5.35e-03, grad_scale: 32.0 2023-11-20 07:04:08,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=987620.0, ans=0.0 2023-11-20 07:04:09,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=987620.0, ans=0.1 2023-11-20 07:04:12,679 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2023-11-20 07:04:15,191 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 148150 2023-11-20 07:04:16,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=987620.0, ans=0.125 2023-11-20 07:04:17,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=987620.0, ans=0.0 2023-11-20 07:04:28,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=987686.6666666666, ans=0.0 2023-11-20 07:04:29,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.45 vs. limit=22.5 2023-11-20 07:04:33,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=987753.3333333334, ans=0.1 2023-11-20 07:04:40,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=987753.3333333334, ans=0.0 2023-11-20 07:05:00,259 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 3900, loss[loss=0.0955, simple_loss=0.126, pruned_loss=0.02531, audio_tagging_loss=0.007197, over 15040.00 frames. ], tot_loss[loss=0.08085, simple_loss=0.1011, pruned_loss=0.02021, audio_tagging_loss=0.01008, over 3049271.95 frames. ], batch size: 56, lr: 5.35e-03, grad_scale: 32.0 2023-11-20 07:05:15,597 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.880e+01 8.310e+01 9.082e+01 9.917e+01 1.355e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-20 07:05:19,472 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 148200 2023-11-20 07:05:20,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=987953.3333333334, ans=0.0 2023-11-20 07:05:31,590 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:05:51,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=988153.3333333334, ans=0.125 2023-11-20 07:05:51,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=988153.3333333334, ans=0.0 2023-11-20 07:06:05,414 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 3950, loss[loss=0.0694, simple_loss=0.08718, pruned_loss=0.01464, audio_tagging_loss=0.01118, over 15504.00 frames. ], tot_loss[loss=0.0812, simple_loss=0.1017, pruned_loss=0.02029, audio_tagging_loss=0.01006, over 3048286.02 frames. ], batch size: 58, lr: 5.35e-03, grad_scale: 32.0 2023-11-20 07:06:25,464 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 148250 2023-11-20 07:06:30,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=988353.3333333334, ans=0.0 2023-11-20 07:06:45,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=988420.0, ans=0.2 2023-11-20 07:06:50,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=988420.0, ans=0.125 2023-11-20 07:07:00,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=988486.6666666666, ans=0.125 2023-11-20 07:07:00,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=988486.6666666666, ans=0.125 2023-11-20 07:07:09,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=988553.3333333334, ans=0.125 2023-11-20 07:07:10,796 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 4000, loss[loss=0.07109, simple_loss=0.08477, pruned_loss=0.01762, audio_tagging_loss=0.01109, over 13828.00 frames. ], tot_loss[loss=0.08151, simple_loss=0.1018, pruned_loss=0.02043, audio_tagging_loss=0.01017, over 3045939.60 frames. ], batch size: 54, lr: 5.35e-03, grad_scale: 32.0 2023-11-20 07:07:16,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=988553.3333333334, ans=0.0 2023-11-20 07:07:20,615 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.44 vs. limit=15.0 2023-11-20 07:07:26,774 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.419e+01 9.137e+01 9.996e+01 1.272e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-20 07:07:30,572 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 148300 2023-11-20 07:07:43,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=988686.6666666666, ans=0.0 2023-11-20 07:07:46,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=988686.6666666666, ans=0.125 2023-11-20 07:07:58,478 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.56 vs. limit=15.0 2023-11-20 07:08:00,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=988753.3333333334, ans=0.125 2023-11-20 07:08:16,236 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 4050, loss[loss=0.08534, simple_loss=0.09953, pruned_loss=0.0231, audio_tagging_loss=0.01248, over 14778.00 frames. ], tot_loss[loss=0.08226, simple_loss=0.1028, pruned_loss=0.02061, audio_tagging_loss=0.01024, over 3047638.88 frames. ], batch size: 57, lr: 5.35e-03, grad_scale: 32.0 2023-11-20 07:08:18,723 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:08:22,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=988886.6666666666, ans=0.1 2023-11-20 07:08:34,937 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 148350 2023-11-20 07:08:37,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=988953.3333333334, ans=0.1 2023-11-20 07:08:42,887 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:08:43,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=989020.0, ans=0.0 2023-11-20 07:09:13,120 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.56 vs. limit=10.0 2023-11-20 07:09:20,351 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 4100, loss[loss=0.08869, simple_loss=0.1182, pruned_loss=0.02112, audio_tagging_loss=0.008483, over 16702.00 frames. ], tot_loss[loss=0.0819, simple_loss=0.1023, pruned_loss=0.02054, audio_tagging_loss=0.0102, over 3053609.64 frames. ], batch size: 59, lr: 5.35e-03, grad_scale: 16.0 2023-11-20 07:09:36,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=989286.6666666666, ans=0.125 2023-11-20 07:09:37,911 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.747e+01 7.925e+01 8.541e+01 9.436e+01 1.128e+02, threshold=1.708e+02, percent-clipped=0.0 2023-11-20 07:09:39,233 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 148400 2023-11-20 07:10:24,274 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 4150, loss[loss=0.05462, simple_loss=0.06512, pruned_loss=0.01208, audio_tagging_loss=0.009979, over 13987.00 frames. ], tot_loss[loss=0.08229, simple_loss=0.103, pruned_loss=0.02077, audio_tagging_loss=0.01002, over 3041213.72 frames. ], batch size: 54, lr: 5.35e-03, grad_scale: 8.0 2023-11-20 07:10:29,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=989553.3333333334, ans=0.125 2023-11-20 07:10:44,139 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 148450 2023-11-20 07:10:46,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=989620.0, ans=0.1 2023-11-20 07:11:09,840 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:11:18,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=989820.0, ans=0.125 2023-11-20 07:11:28,773 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 4200, loss[loss=0.07929, simple_loss=0.0892, pruned_loss=0.02217, audio_tagging_loss=0.01252, over 14809.00 frames. ], tot_loss[loss=0.08215, simple_loss=0.1031, pruned_loss=0.02067, audio_tagging_loss=0.00994, over 3049015.47 frames. ], batch size: 56, lr: 5.35e-03, grad_scale: 8.0 2023-11-20 07:11:46,389 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.989e+01 8.148e+01 8.675e+01 9.910e+01 1.292e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-20 07:11:47,746 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 148500 2023-11-20 07:12:07,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=990086.6666666666, ans=0.0 2023-11-20 07:12:13,797 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=6.0 2023-11-20 07:12:15,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=990086.6666666666, ans=0.2 2023-11-20 07:12:32,952 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 4250, loss[loss=0.08387, simple_loss=0.1085, pruned_loss=0.02217, audio_tagging_loss=0.007468, over 15227.00 frames. ], tot_loss[loss=0.08124, simple_loss=0.1023, pruned_loss=0.02016, audio_tagging_loss=0.009949, over 3049948.55 frames. ], batch size: 60, lr: 5.35e-03, grad_scale: 8.0 2023-11-20 07:12:50,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=990286.6666666666, ans=0.0 2023-11-20 07:12:52,104 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.49 vs. limit=15.0 2023-11-20 07:12:52,614 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 148550 2023-11-20 07:13:24,134 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-11-20 07:13:38,288 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 4300, loss[loss=0.0954, simple_loss=0.1283, pruned_loss=0.02385, audio_tagging_loss=0.007406, over 15036.00 frames. ], tot_loss[loss=0.08174, simple_loss=0.1029, pruned_loss=0.02035, audio_tagging_loss=0.009955, over 3046204.98 frames. ], batch size: 56, lr: 5.35e-03, grad_scale: 8.0 2023-11-20 07:13:40,497 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.29 vs. limit=15.0 2023-11-20 07:13:54,197 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2023-11-20 07:13:56,606 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.158e+01 8.001e+01 8.705e+01 9.608e+01 1.261e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 07:13:57,916 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 148600 2023-11-20 07:14:03,855 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-11-20 07:14:16,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=990753.3333333334, ans=0.125 2023-11-20 07:14:23,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=990753.3333333334, ans=0.0 2023-11-20 07:14:43,236 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 4350, loss[loss=0.07012, simple_loss=0.08611, pruned_loss=0.01702, audio_tagging_loss=0.01005, over 14033.00 frames. ], tot_loss[loss=0.08204, simple_loss=0.103, pruned_loss=0.02054, audio_tagging_loss=0.009983, over 3047357.67 frames. ], batch size: 53, lr: 5.35e-03, grad_scale: 8.0 2023-11-20 07:15:02,453 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 148650 2023-11-20 07:15:12,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=991020.0, ans=0.125 2023-11-20 07:15:28,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=991086.6666666666, ans=6.0 2023-11-20 07:15:43,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=991153.3333333334, ans=0.1 2023-11-20 07:15:48,004 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 4400, loss[loss=0.06134, simple_loss=0.06369, pruned_loss=0.01617, audio_tagging_loss=0.01333, over 16072.00 frames. ], tot_loss[loss=0.08165, simple_loss=0.1024, pruned_loss=0.02048, audio_tagging_loss=0.009948, over 3044010.70 frames. ], batch size: 61, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:15:59,867 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.16 vs. limit=15.0 2023-11-20 07:16:05,879 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.660e+01 8.299e+01 8.915e+01 9.764e+01 1.212e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-20 07:16:07,205 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 148700 2023-11-20 07:16:16,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=991353.3333333334, ans=0.0 2023-11-20 07:16:22,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=991353.3333333334, ans=0.025 2023-11-20 07:16:29,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=991420.0, ans=0.125 2023-11-20 07:16:30,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=991420.0, ans=0.125 2023-11-20 07:16:34,210 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2023-11-20 07:16:52,165 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 4450, loss[loss=0.08977, simple_loss=0.1125, pruned_loss=0.02481, audio_tagging_loss=0.008729, over 16116.00 frames. ], tot_loss[loss=0.08169, simple_loss=0.1024, pruned_loss=0.02053, audio_tagging_loss=0.009949, over 3045988.24 frames. ], batch size: 60, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:17:08,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.38 vs. limit=10.0 2023-11-20 07:17:12,440 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 148750 2023-11-20 07:17:22,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=991686.6666666666, ans=0.0 2023-11-20 07:17:31,984 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.47 vs. limit=15.0 2023-11-20 07:17:58,012 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 4500, loss[loss=0.09109, simple_loss=0.1204, pruned_loss=0.02325, audio_tagging_loss=0.007647, over 15927.00 frames. ], tot_loss[loss=0.08161, simple_loss=0.1023, pruned_loss=0.0206, audio_tagging_loss=0.00988, over 3042997.95 frames. ], batch size: 60, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:17:58,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=991886.6666666666, ans=0.125 2023-11-20 07:18:01,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=991886.6666666666, ans=0.125 2023-11-20 07:18:13,845 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2023-11-20 07:18:15,711 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.390e+01 8.916e+01 9.990e+01 1.363e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-20 07:18:17,045 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 148800 2023-11-20 07:18:20,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=991953.3333333334, ans=0.125 2023-11-20 07:18:29,240 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=15.0 2023-11-20 07:18:37,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=992086.6666666666, ans=0.1 2023-11-20 07:18:40,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=992086.6666666666, ans=0.035 2023-11-20 07:19:02,829 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 4550, loss[loss=0.07706, simple_loss=0.09552, pruned_loss=0.02108, audio_tagging_loss=0.008223, over 15834.00 frames. ], tot_loss[loss=0.08141, simple_loss=0.1021, pruned_loss=0.02048, audio_tagging_loss=0.009889, over 3039220.96 frames. ], batch size: 58, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:19:21,555 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 148850 2023-11-20 07:19:23,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=992286.6666666666, ans=0.1 2023-11-20 07:19:37,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=992353.3333333334, ans=0.0 2023-11-20 07:19:49,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=992420.0, ans=0.125 2023-11-20 07:19:51,868 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:20:03,724 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.68 vs. limit=15.0 2023-11-20 07:20:06,667 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 4600, loss[loss=0.07122, simple_loss=0.0884, pruned_loss=0.01673, audio_tagging_loss=0.01029, over 15999.00 frames. ], tot_loss[loss=0.08135, simple_loss=0.1019, pruned_loss=0.02039, audio_tagging_loss=0.01002, over 3045854.69 frames. ], batch size: 60, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:20:18,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=992620.0, ans=0.125 2023-11-20 07:20:25,862 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.427e+01 8.263e+01 8.928e+01 9.927e+01 1.394e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-20 07:20:27,171 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 148900 2023-11-20 07:20:35,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=992686.6666666666, ans=0.0 2023-11-20 07:21:08,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=992820.0, ans=10.0 2023-11-20 07:21:11,578 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 4650, loss[loss=0.08467, simple_loss=0.1102, pruned_loss=0.02011, audio_tagging_loss=0.00945, over 15588.00 frames. ], tot_loss[loss=0.08078, simple_loss=0.101, pruned_loss=0.02014, audio_tagging_loss=0.01012, over 3051994.25 frames. ], batch size: 57, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:21:24,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=992953.3333333334, ans=0.1 2023-11-20 07:21:31,207 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 148950 2023-11-20 07:22:13,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=993153.3333333334, ans=0.2 2023-11-20 07:22:16,910 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 4700, loss[loss=0.07508, simple_loss=0.09352, pruned_loss=0.01828, audio_tagging_loss=0.01004, over 13854.00 frames. ], tot_loss[loss=0.08107, simple_loss=0.1016, pruned_loss=0.0201, audio_tagging_loss=0.01017, over 3042887.45 frames. ], batch size: 53, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:22:18,902 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2023-11-20 07:22:32,585 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-20 07:22:34,029 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 8.015e+01 8.729e+01 9.415e+01 1.258e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-20 07:22:35,411 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 149000 2023-11-20 07:22:38,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=993286.6666666666, ans=0.125 2023-11-20 07:22:46,861 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.95 vs. limit=15.0 2023-11-20 07:22:50,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=993353.3333333334, ans=0.125 2023-11-20 07:22:54,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=993420.0, ans=0.125 2023-11-20 07:23:00,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=993420.0, ans=0.0 2023-11-20 07:23:07,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=993420.0, ans=0.125 2023-11-20 07:23:21,481 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 4750, loss[loss=0.09784, simple_loss=0.1228, pruned_loss=0.02928, audio_tagging_loss=0.00714, over 15731.00 frames. ], tot_loss[loss=0.08067, simple_loss=0.1007, pruned_loss=0.01995, audio_tagging_loss=0.01036, over 3048850.06 frames. ], batch size: 57, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:23:38,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=993620.0, ans=0.125 2023-11-20 07:23:40,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=993620.0, ans=0.0 2023-11-20 07:23:41,194 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 149050 2023-11-20 07:24:12,022 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.19 vs. limit=22.5 2023-11-20 07:24:14,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=993820.0, ans=0.125 2023-11-20 07:24:25,462 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 4800, loss[loss=0.07486, simple_loss=0.08752, pruned_loss=0.0175, audio_tagging_loss=0.0136, over 15442.00 frames. ], tot_loss[loss=0.08054, simple_loss=0.1003, pruned_loss=0.01996, audio_tagging_loss=0.01043, over 3050928.55 frames. ], batch size: 57, lr: 5.34e-03, grad_scale: 32.0 2023-11-20 07:24:35,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=993886.6666666666, ans=0.1 2023-11-20 07:24:44,667 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.552e+01 8.333e+01 8.832e+01 9.618e+01 1.335e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-20 07:24:46,059 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 149100 2023-11-20 07:25:04,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=994086.6666666666, ans=0.0 2023-11-20 07:25:19,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=994153.3333333334, ans=0.0 2023-11-20 07:25:31,563 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 4850, loss[loss=0.07769, simple_loss=0.09553, pruned_loss=0.02032, audio_tagging_loss=0.009604, over 14825.00 frames. ], tot_loss[loss=0.08078, simple_loss=0.1007, pruned_loss=0.01994, audio_tagging_loss=0.01047, over 3049209.40 frames. ], batch size: 58, lr: 5.34e-03, grad_scale: 32.0 2023-11-20 07:25:32,271 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=15.0 2023-11-20 07:25:36,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=994220.0, ans=0.125 2023-11-20 07:25:42,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=994286.6666666666, ans=0.125 2023-11-20 07:25:49,751 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 149150 2023-11-20 07:26:06,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=994353.3333333334, ans=0.125 2023-11-20 07:26:21,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=994486.6666666666, ans=0.2 2023-11-20 07:26:21,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=994486.6666666666, ans=0.125 2023-11-20 07:26:34,938 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 4900, loss[loss=0.08215, simple_loss=0.11, pruned_loss=0.01994, audio_tagging_loss=0.007213, over 15759.00 frames. ], tot_loss[loss=0.08062, simple_loss=0.1006, pruned_loss=0.0199, audio_tagging_loss=0.01044, over 3042810.43 frames. ], batch size: 57, lr: 5.34e-03, grad_scale: 32.0 2023-11-20 07:26:45,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=994553.3333333334, ans=10.0 2023-11-20 07:26:52,652 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.754e+01 8.440e+01 9.252e+01 1.012e+02 1.571e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-20 07:26:54,705 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 149200 2023-11-20 07:26:56,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=994620.0, ans=0.125 2023-11-20 07:27:07,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.80 vs. limit=22.5 2023-11-20 07:27:12,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=994686.6666666666, ans=0.125 2023-11-20 07:27:24,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=994753.3333333334, ans=0.125 2023-11-20 07:27:28,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=994820.0, ans=0.0 2023-11-20 07:27:35,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=994820.0, ans=0.2 2023-11-20 07:27:37,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=994820.0, ans=0.2 2023-11-20 07:27:39,756 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 4950, loss[loss=0.06848, simple_loss=0.08641, pruned_loss=0.01493, audio_tagging_loss=0.01035, over 15069.00 frames. ], tot_loss[loss=0.08062, simple_loss=0.101, pruned_loss=0.01991, audio_tagging_loss=0.01018, over 3049509.37 frames. ], batch size: 56, lr: 5.33e-03, grad_scale: 32.0 2023-11-20 07:27:48,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=994886.6666666666, ans=0.1 2023-11-20 07:27:59,586 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 149250 2023-11-20 07:28:10,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=995020.0, ans=0.1 2023-11-20 07:28:12,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2023-11-20 07:28:33,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=995153.3333333334, ans=0.5 2023-11-20 07:28:36,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=995153.3333333334, ans=0.07 2023-11-20 07:28:41,085 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2023-11-20 07:28:43,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=995153.3333333334, ans=0.125 2023-11-20 07:28:45,194 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 5000, loss[loss=0.07597, simple_loss=0.09282, pruned_loss=0.02016, audio_tagging_loss=0.009399, over 16118.00 frames. ], tot_loss[loss=0.08044, simple_loss=0.1013, pruned_loss=0.01984, audio_tagging_loss=0.009937, over 3050175.02 frames. ], batch size: 61, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:28:53,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=995220.0, ans=0.125 2023-11-20 07:28:54,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=995220.0, ans=0.0 2023-11-20 07:28:54,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=995220.0, ans=0.125 2023-11-20 07:29:04,103 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.403e+01 7.937e+01 8.592e+01 9.230e+01 1.200e+02, threshold=1.718e+02, percent-clipped=0.0 2023-11-20 07:29:04,275 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 149300 2023-11-20 07:29:12,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=995353.3333333334, ans=0.0 2023-11-20 07:29:16,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=995353.3333333334, ans=0.125 2023-11-20 07:29:16,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.64 vs. limit=22.5 2023-11-20 07:29:17,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=995353.3333333334, ans=0.125 2023-11-20 07:29:33,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=995420.0, ans=0.2 2023-11-20 07:29:38,565 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2023-11-20 07:29:49,357 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 5050, loss[loss=0.09371, simple_loss=0.1198, pruned_loss=0.025, audio_tagging_loss=0.0088, over 16095.00 frames. ], tot_loss[loss=0.08063, simple_loss=0.1017, pruned_loss=0.01995, audio_tagging_loss=0.009853, over 3049701.22 frames. ], batch size: 60, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:29:52,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=995553.3333333334, ans=0.0 2023-11-20 07:30:05,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=22.5 2023-11-20 07:30:05,373 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.03 vs. limit=15.0 2023-11-20 07:30:06,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=995620.0, ans=0.125 2023-11-20 07:30:08,513 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 149350 2023-11-20 07:30:34,333 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.88 vs. limit=12.0 2023-11-20 07:30:44,373 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.74 vs. limit=6.0 2023-11-20 07:30:50,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=995820.0, ans=0.125 2023-11-20 07:30:52,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=995820.0, ans=0.1 2023-11-20 07:30:54,246 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 5100, loss[loss=0.08171, simple_loss=0.1064, pruned_loss=0.02018, audio_tagging_loss=0.008324, over 14661.00 frames. ], tot_loss[loss=0.08011, simple_loss=0.101, pruned_loss=0.0198, audio_tagging_loss=0.009837, over 3051276.60 frames. ], batch size: 55, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:31:00,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=995886.6666666666, ans=0.125 2023-11-20 07:31:13,988 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+01 8.152e+01 8.992e+01 9.912e+01 1.301e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-20 07:31:14,147 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 149400 2023-11-20 07:31:14,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=995953.3333333334, ans=0.125 2023-11-20 07:31:14,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=995953.3333333334, ans=0.125 2023-11-20 07:31:42,452 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.77 vs. limit=22.5 2023-11-20 07:31:43,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=996086.6666666666, ans=0.125 2023-11-20 07:31:59,935 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 5150, loss[loss=0.1028, simple_loss=0.1416, pruned_loss=0.0246, audio_tagging_loss=0.007388, over 16603.00 frames. ], tot_loss[loss=0.08017, simple_loss=0.1013, pruned_loss=0.01974, audio_tagging_loss=0.009787, over 3050673.36 frames. ], batch size: 61, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:32:01,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=996220.0, ans=0.125 2023-11-20 07:32:18,925 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 149450 2023-11-20 07:32:42,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=996420.0, ans=0.2 2023-11-20 07:32:42,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=996420.0, ans=0.125 2023-11-20 07:33:03,256 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 5200, loss[loss=0.07364, simple_loss=0.08933, pruned_loss=0.01934, audio_tagging_loss=0.009639, over 15105.00 frames. ], tot_loss[loss=0.07984, simple_loss=0.1007, pruned_loss=0.01966, audio_tagging_loss=0.00983, over 3046657.40 frames. ], batch size: 57, lr: 5.33e-03, grad_scale: 32.0 2023-11-20 07:33:06,759 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.90 vs. limit=15.0 2023-11-20 07:33:10,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=996553.3333333334, ans=0.0 2023-11-20 07:33:11,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=996553.3333333334, ans=0.0 2023-11-20 07:33:15,343 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.47 vs. limit=15.0 2023-11-20 07:33:19,006 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2023-11-20 07:33:21,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=996620.0, ans=0.015 2023-11-20 07:33:22,722 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+01 8.127e+01 8.748e+01 9.361e+01 1.233e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 07:33:22,888 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 149500 2023-11-20 07:33:39,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.70 vs. limit=15.0 2023-11-20 07:33:50,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=996753.3333333334, ans=0.07 2023-11-20 07:34:07,794 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 5250, loss[loss=0.0858, simple_loss=0.1016, pruned_loss=0.02418, audio_tagging_loss=0.01084, over 16324.00 frames. ], tot_loss[loss=0.08055, simple_loss=0.1015, pruned_loss=0.02006, audio_tagging_loss=0.009757, over 3044539.66 frames. ], batch size: 58, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:34:10,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=996886.6666666666, ans=0.0 2023-11-20 07:34:12,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=996886.6666666666, ans=0.0 2023-11-20 07:34:17,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=996886.6666666666, ans=0.125 2023-11-20 07:34:21,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=996953.3333333334, ans=0.0 2023-11-20 07:34:28,152 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 149550 2023-11-20 07:34:37,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=997020.0, ans=0.2 2023-11-20 07:34:54,029 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2023-11-20 07:35:12,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=997220.0, ans=0.0 2023-11-20 07:35:13,116 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 5300, loss[loss=0.08489, simple_loss=0.1032, pruned_loss=0.0205, audio_tagging_loss=0.0128, over 15717.00 frames. ], tot_loss[loss=0.08034, simple_loss=0.1011, pruned_loss=0.01995, audio_tagging_loss=0.009826, over 3044042.46 frames. ], batch size: 59, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:35:23,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=997220.0, ans=0.1 2023-11-20 07:35:32,196 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 149600 2023-11-20 07:35:32,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=997286.6666666666, ans=0.1 2023-11-20 07:35:33,243 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.168e+01 7.882e+01 8.704e+01 9.363e+01 1.429e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 07:35:36,343 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:35:47,116 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.612e-01 2023-11-20 07:35:51,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=997420.0, ans=0.1 2023-11-20 07:36:03,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=997420.0, ans=0.125 2023-11-20 07:36:08,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=997486.6666666666, ans=0.0 2023-11-20 07:36:18,225 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 5350, loss[loss=0.07114, simple_loss=0.09312, pruned_loss=0.01517, audio_tagging_loss=0.009406, over 14431.00 frames. ], tot_loss[loss=0.081, simple_loss=0.1021, pruned_loss=0.02018, audio_tagging_loss=0.009789, over 3036540.17 frames. ], batch size: 55, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:36:37,670 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 149650 2023-11-20 07:36:42,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=997686.6666666666, ans=0.125 2023-11-20 07:36:59,173 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.94 vs. limit=10.0 2023-11-20 07:37:01,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=997753.3333333334, ans=0.2 2023-11-20 07:37:21,720 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 5400, loss[loss=0.07698, simple_loss=0.1032, pruned_loss=0.01628, audio_tagging_loss=0.009085, over 15455.00 frames. ], tot_loss[loss=0.08085, simple_loss=0.1021, pruned_loss=0.01995, audio_tagging_loss=0.009851, over 3041683.32 frames. ], batch size: 58, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:37:21,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=997886.6666666666, ans=0.035 2023-11-20 07:37:31,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=997886.6666666666, ans=0.1 2023-11-20 07:37:39,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=997953.3333333334, ans=0.0 2023-11-20 07:37:41,821 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 149700 2023-11-20 07:37:42,884 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.237e+01 8.118e+01 8.667e+01 9.522e+01 1.475e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 07:37:59,655 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:38:14,129 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.48 vs. limit=22.5 2023-11-20 07:38:26,845 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 5450, loss[loss=0.07387, simple_loss=0.0834, pruned_loss=0.01567, audio_tagging_loss=0.0165, over 14487.00 frames. ], tot_loss[loss=0.08157, simple_loss=0.1029, pruned_loss=0.02017, audio_tagging_loss=0.009959, over 3047685.35 frames. ], batch size: 57, lr: 5.33e-03, grad_scale: 8.0 2023-11-20 07:38:40,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=998286.6666666666, ans=10.0 2023-11-20 07:38:45,941 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 149750 2023-11-20 07:38:53,672 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.04 vs. limit=22.5 2023-11-20 07:39:13,979 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-20 07:39:19,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=998486.6666666666, ans=0.125 2023-11-20 07:39:30,990 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 5500, loss[loss=0.06566, simple_loss=0.07337, pruned_loss=0.01692, audio_tagging_loss=0.01205, over 14185.00 frames. ], tot_loss[loss=0.08261, simple_loss=0.1043, pruned_loss=0.02054, audio_tagging_loss=0.009932, over 3044725.36 frames. ], batch size: 54, lr: 5.32e-03, grad_scale: 8.0 2023-11-20 07:39:31,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=998553.3333333334, ans=0.0 2023-11-20 07:39:33,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=998553.3333333334, ans=0.125 2023-11-20 07:39:42,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=998620.0, ans=0.1 2023-11-20 07:39:47,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=998620.0, ans=0.125 2023-11-20 07:39:49,356 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 149800 2023-11-20 07:39:49,834 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.23 vs. limit=12.0 2023-11-20 07:39:52,070 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.523e+01 8.049e+01 8.675e+01 9.506e+01 1.252e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-20 07:39:53,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=998620.0, ans=0.07 2023-11-20 07:39:59,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=998686.6666666666, ans=0.125 2023-11-20 07:40:00,579 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=15.0 2023-11-20 07:40:14,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=998753.3333333334, ans=0.0 2023-11-20 07:40:16,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=998753.3333333334, ans=0.1 2023-11-20 07:40:21,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=998820.0, ans=0.125 2023-11-20 07:40:33,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=998886.6666666666, ans=6.0 2023-11-20 07:40:34,393 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 5550, loss[loss=0.07047, simple_loss=0.08782, pruned_loss=0.0173, audio_tagging_loss=0.009264, over 14859.00 frames. ], tot_loss[loss=0.0823, simple_loss=0.1036, pruned_loss=0.02045, audio_tagging_loss=0.01004, over 3051249.21 frames. ], batch size: 57, lr: 5.32e-03, grad_scale: 8.0 2023-11-20 07:40:54,562 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 149850 2023-11-20 07:40:57,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=998953.3333333334, ans=0.0 2023-11-20 07:41:06,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=999020.0, ans=0.0 2023-11-20 07:41:10,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=999020.0, ans=0.125 2023-11-20 07:41:24,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=999086.6666666666, ans=0.125 2023-11-20 07:41:26,738 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.22 vs. limit=22.5 2023-11-20 07:41:40,107 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 5600, loss[loss=0.09262, simple_loss=0.1209, pruned_loss=0.02347, audio_tagging_loss=0.008695, over 15082.00 frames. ], tot_loss[loss=0.0823, simple_loss=0.1037, pruned_loss=0.02036, audio_tagging_loss=0.0101, over 3049722.19 frames. ], batch size: 56, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:41:55,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=999286.6666666666, ans=0.2 2023-11-20 07:41:59,460 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 149900 2023-11-20 07:42:01,822 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.809e+01 8.057e+01 8.712e+01 9.430e+01 1.236e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 07:42:16,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=999420.0, ans=0.125 2023-11-20 07:42:23,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=999420.0, ans=0.125 2023-11-20 07:42:24,350 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:42:37,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=999486.6666666666, ans=0.125 2023-11-20 07:42:44,603 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 5650, loss[loss=0.08389, simple_loss=0.106, pruned_loss=0.02071, audio_tagging_loss=0.01019, over 14222.00 frames. ], tot_loss[loss=0.082, simple_loss=0.1034, pruned_loss=0.02015, audio_tagging_loss=0.01016, over 3047829.79 frames. ], batch size: 56, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:42:58,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=999620.0, ans=0.125 2023-11-20 07:43:03,309 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 149950 2023-11-20 07:43:07,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=999620.0, ans=0.2 2023-11-20 07:43:16,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=999686.6666666666, ans=0.125 2023-11-20 07:43:22,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=999753.3333333334, ans=0.125 2023-11-20 07:43:40,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=999820.0, ans=0.0 2023-11-20 07:43:48,665 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 5700, loss[loss=0.09525, simple_loss=0.1174, pruned_loss=0.02722, audio_tagging_loss=0.009321, over 16027.00 frames. ], tot_loss[loss=0.0821, simple_loss=0.1034, pruned_loss=0.0203, audio_tagging_loss=0.01009, over 3050128.73 frames. ], batch size: 60, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:44:01,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=999953.3333333334, ans=0.0 2023-11-20 07:44:08,570 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 150000 2023-11-20 07:44:11,213 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.782e+01 7.942e+01 8.498e+01 9.233e+01 1.252e+02, threshold=1.700e+02, percent-clipped=0.0 2023-11-20 07:44:45,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1000153.3333333334, ans=0.125 2023-11-20 07:44:53,183 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 5750, loss[loss=0.08919, simple_loss=0.1156, pruned_loss=0.02073, audio_tagging_loss=0.01065, over 14327.00 frames. ], tot_loss[loss=0.08124, simple_loss=0.1023, pruned_loss=0.02007, audio_tagging_loss=0.01004, over 3047516.73 frames. ], batch size: 54, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:45:09,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2023-11-20 07:45:13,412 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 150050 2023-11-20 07:45:14,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1000286.6666666666, ans=0.2 2023-11-20 07:45:16,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1000286.6666666666, ans=0.1 2023-11-20 07:45:17,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1000286.6666666666, ans=0.125 2023-11-20 07:45:30,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1000420.0, ans=0.2 2023-11-20 07:45:58,340 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 5800, loss[loss=0.08063, simple_loss=0.1083, pruned_loss=0.01767, audio_tagging_loss=0.008831, over 15182.00 frames. ], tot_loss[loss=0.08037, simple_loss=0.1012, pruned_loss=0.01979, audio_tagging_loss=0.00999, over 3050064.54 frames. ], batch size: 57, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:46:02,916 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.98 vs. limit=8.0 2023-11-20 07:46:04,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1000553.3333333334, ans=0.125 2023-11-20 07:46:07,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1000553.3333333334, ans=0.2 2023-11-20 07:46:08,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.45 vs. limit=22.5 2023-11-20 07:46:11,181 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.11 vs. limit=22.5 2023-11-20 07:46:16,864 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 150100 2023-11-20 07:46:18,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.61 vs. limit=15.0 2023-11-20 07:46:19,218 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.622e+01 8.139e+01 8.776e+01 9.398e+01 1.793e+02, threshold=1.755e+02, percent-clipped=1.0 2023-11-20 07:46:30,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1000686.6666666666, ans=0.125 2023-11-20 07:46:35,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1000753.3333333334, ans=0.1 2023-11-20 07:46:39,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1000753.3333333334, ans=0.1 2023-11-20 07:46:52,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1000820.0, ans=0.0 2023-11-20 07:47:01,818 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 5850, loss[loss=0.07175, simple_loss=0.09472, pruned_loss=0.01411, audio_tagging_loss=0.01028, over 15688.00 frames. ], tot_loss[loss=0.08057, simple_loss=0.1014, pruned_loss=0.0199, audio_tagging_loss=0.009979, over 3045865.22 frames. ], batch size: 59, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:47:13,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1000953.3333333334, ans=0.125 2023-11-20 07:47:19,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1000953.3333333334, ans=0.2 2023-11-20 07:47:20,807 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 150150 2023-11-20 07:47:25,877 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.85 vs. limit=15.0 2023-11-20 07:47:31,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1001020.0, ans=0.1 2023-11-20 07:47:58,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1001153.3333333334, ans=0.1 2023-11-20 07:48:05,898 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 5900, loss[loss=0.09528, simple_loss=0.1232, pruned_loss=0.0245, audio_tagging_loss=0.009171, over 16328.00 frames. ], tot_loss[loss=0.08102, simple_loss=0.102, pruned_loss=0.02008, audio_tagging_loss=0.009921, over 3039781.43 frames. ], batch size: 59, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:48:23,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1001286.6666666666, ans=0.0 2023-11-20 07:48:25,957 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 150200 2023-11-20 07:48:28,738 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.681e+01 8.240e+01 8.857e+01 9.692e+01 1.369e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-20 07:48:30,671 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.31 vs. limit=15.0 2023-11-20 07:48:31,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1001353.3333333334, ans=0.1 2023-11-20 07:48:40,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1001353.3333333334, ans=0.0 2023-11-20 07:49:05,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1001486.6666666666, ans=0.1 2023-11-20 07:49:09,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1001553.3333333334, ans=0.2 2023-11-20 07:49:10,183 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 5950, loss[loss=0.07864, simple_loss=0.1014, pruned_loss=0.01778, audio_tagging_loss=0.01015, over 14703.00 frames. ], tot_loss[loss=0.08013, simple_loss=0.1009, pruned_loss=0.01977, audio_tagging_loss=0.00989, over 3040889.06 frames. ], batch size: 56, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:49:13,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1001553.3333333334, ans=0.1 2023-11-20 07:49:22,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1001620.0, ans=0.125 2023-11-20 07:49:26,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1001620.0, ans=0.0 2023-11-20 07:49:26,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1001620.0, ans=0.1 2023-11-20 07:49:29,868 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 150250 2023-11-20 07:49:29,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1001620.0, ans=0.035 2023-11-20 07:49:30,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1001620.0, ans=0.07 2023-11-20 07:49:31,506 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.54 vs. limit=12.0 2023-11-20 07:49:33,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1001620.0, ans=0.125 2023-11-20 07:49:42,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1001686.6666666666, ans=10.0 2023-11-20 07:49:44,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=1001686.6666666666, ans=0.1 2023-11-20 07:49:48,042 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.66 vs. limit=15.0 2023-11-20 07:49:55,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1001753.3333333334, ans=0.5 2023-11-20 07:49:55,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1001753.3333333334, ans=0.125 2023-11-20 07:50:14,464 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 6000, loss[loss=0.08908, simple_loss=0.1156, pruned_loss=0.02024, audio_tagging_loss=0.01107, over 15694.00 frames. ], tot_loss[loss=0.08063, simple_loss=0.1014, pruned_loss=0.01995, audio_tagging_loss=0.009951, over 3036858.10 frames. ], batch size: 57, lr: 5.32e-03, grad_scale: 32.0 2023-11-20 07:50:14,465 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-20 07:50:55,332 INFO [train_asr.py:1294] (3/4) Epoch 13, validation: loss=0.06203, simple_loss=0.05394, pruned_loss=0.00581, audio_tagging_loss=0.02925, over 4681554.00 frames. 2023-11-20 07:50:55,333 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-20 07:50:56,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1001886.6666666666, ans=0.0 2023-11-20 07:50:58,374 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.95 vs. limit=15.0 2023-11-20 07:51:00,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1001886.6666666666, ans=0.0 2023-11-20 07:51:15,065 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 150300 2023-11-20 07:51:17,462 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.658e+01 7.953e+01 8.643e+01 9.262e+01 1.038e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-20 07:51:17,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1001953.3333333334, ans=0.125 2023-11-20 07:51:22,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1002020.0, ans=0.125 2023-11-20 07:51:22,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1002020.0, ans=0.2 2023-11-20 07:51:24,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1002020.0, ans=0.125 2023-11-20 07:51:33,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1002086.6666666666, ans=0.0 2023-11-20 07:51:37,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1002086.6666666666, ans=0.125 2023-11-20 07:51:41,306 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:51:41,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1002086.6666666666, ans=0.1 2023-11-20 07:51:44,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1002086.6666666666, ans=0.1 2023-11-20 07:51:52,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1002153.3333333334, ans=0.0 2023-11-20 07:51:59,903 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 6050, loss[loss=0.06359, simple_loss=0.08798, pruned_loss=0.01125, audio_tagging_loss=0.008353, over 16147.00 frames. ], tot_loss[loss=0.08019, simple_loss=0.1012, pruned_loss=0.01973, audio_tagging_loss=0.009879, over 3042800.22 frames. ], batch size: 60, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:52:13,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1002286.6666666666, ans=0.2 2023-11-20 07:52:15,733 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.15 vs. limit=10.0 2023-11-20 07:52:17,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1002286.6666666666, ans=0.07 2023-11-20 07:52:18,912 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 150350 2023-11-20 07:52:43,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1002420.0, ans=0.0 2023-11-20 07:52:59,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1002486.6666666666, ans=0.125 2023-11-20 07:53:03,906 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 6100, loss[loss=0.07797, simple_loss=0.09799, pruned_loss=0.0188, audio_tagging_loss=0.01017, over 15455.00 frames. ], tot_loss[loss=0.08144, simple_loss=0.1028, pruned_loss=0.02024, audio_tagging_loss=0.009777, over 3049093.91 frames. ], batch size: 58, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:53:11,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1002553.3333333334, ans=0.0 2023-11-20 07:53:23,663 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 150400 2023-11-20 07:53:27,496 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.228e+01 8.482e+01 9.288e+01 1.055e+02 1.647e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-20 07:53:33,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1002686.6666666666, ans=0.2 2023-11-20 07:53:49,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1002753.3333333334, ans=0.125 2023-11-20 07:53:58,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1002820.0, ans=0.1 2023-11-20 07:54:03,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1002820.0, ans=0.125 2023-11-20 07:54:08,350 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 6150, loss[loss=0.1091, simple_loss=0.1315, pruned_loss=0.03128, audio_tagging_loss=0.0121, over 15401.00 frames. ], tot_loss[loss=0.08127, simple_loss=0.1023, pruned_loss=0.02023, audio_tagging_loss=0.009872, over 3042325.44 frames. ], batch size: 56, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:54:28,043 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 150450 2023-11-20 07:54:31,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1002953.3333333334, ans=0.125 2023-11-20 07:54:40,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1003020.0, ans=0.125 2023-11-20 07:54:55,264 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.30 vs. limit=22.5 2023-11-20 07:55:03,302 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.506e-03 2023-11-20 07:55:12,840 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 6200, loss[loss=0.09338, simple_loss=0.1269, pruned_loss=0.02202, audio_tagging_loss=0.007888, over 15056.00 frames. ], tot_loss[loss=0.08059, simple_loss=0.1013, pruned_loss=0.01992, audio_tagging_loss=0.01, over 3036867.06 frames. ], batch size: 55, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:55:32,207 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 150500 2023-11-20 07:55:35,796 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.566e+01 8.052e+01 8.616e+01 9.415e+01 1.229e+02, threshold=1.723e+02, percent-clipped=0.0 2023-11-20 07:55:41,616 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.31 vs. limit=6.0 2023-11-20 07:56:09,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1003486.6666666666, ans=0.125 2023-11-20 07:56:11,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=1003486.6666666666, ans=0.2 2023-11-20 07:56:17,651 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 6250, loss[loss=0.08104, simple_loss=0.09427, pruned_loss=0.02162, audio_tagging_loss=0.01229, over 15287.00 frames. ], tot_loss[loss=0.08064, simple_loss=0.1011, pruned_loss=0.02, audio_tagging_loss=0.0101, over 3032000.05 frames. ], batch size: 56, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:56:31,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.96 vs. limit=22.5 2023-11-20 07:56:37,329 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 150550 2023-11-20 07:56:46,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1003686.6666666666, ans=0.1 2023-11-20 07:56:52,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1003686.6666666666, ans=0.0 2023-11-20 07:57:21,469 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 6300, loss[loss=0.09106, simple_loss=0.1132, pruned_loss=0.02239, audio_tagging_loss=0.01208, over 15525.00 frames. ], tot_loss[loss=0.08138, simple_loss=0.1018, pruned_loss=0.02023, audio_tagging_loss=0.01025, over 3031341.86 frames. ], batch size: 56, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:57:25,097 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.08 vs. limit=15.0 2023-11-20 07:57:41,448 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 150600 2023-11-20 07:57:41,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1003953.3333333334, ans=0.0 2023-11-20 07:57:45,551 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.750e+01 8.528e+01 9.126e+01 1.017e+02 1.272e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-20 07:57:57,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1004020.0, ans=0.0 2023-11-20 07:58:16,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1004153.3333333334, ans=0.125 2023-11-20 07:58:27,050 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 6350, loss[loss=0.08239, simple_loss=0.09669, pruned_loss=0.02147, audio_tagging_loss=0.01259, over 15087.00 frames. ], tot_loss[loss=0.08125, simple_loss=0.1015, pruned_loss=0.02016, audio_tagging_loss=0.01035, over 3029494.26 frames. ], batch size: 58, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:58:27,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1004220.0, ans=0.125 2023-11-20 07:58:27,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1004220.0, ans=0.1 2023-11-20 07:58:46,410 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 150650 2023-11-20 07:58:46,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1004286.6666666666, ans=0.07 2023-11-20 07:58:57,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1004353.3333333334, ans=0.0 2023-11-20 07:59:03,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1004353.3333333334, ans=0.2 2023-11-20 07:59:20,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1004486.6666666666, ans=0.125 2023-11-20 07:59:32,161 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 6400, loss[loss=0.04979, simple_loss=0.05878, pruned_loss=0.008172, audio_tagging_loss=0.01223, over 16487.00 frames. ], tot_loss[loss=0.08118, simple_loss=0.1012, pruned_loss=0.02024, audio_tagging_loss=0.01036, over 3027246.41 frames. ], batch size: 64, lr: 5.31e-03, grad_scale: 32.0 2023-11-20 07:59:37,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1004553.3333333334, ans=0.0 2023-11-20 07:59:37,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1004553.3333333334, ans=0.125 2023-11-20 07:59:51,889 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 150700 2023-11-20 07:59:56,023 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.650e+01 8.100e+01 8.750e+01 9.554e+01 1.310e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 07:59:58,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.69 vs. limit=15.0 2023-11-20 07:59:59,137 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.38 vs. limit=22.5 2023-11-20 08:00:00,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1004686.6666666666, ans=0.125 2023-11-20 08:00:03,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1004686.6666666666, ans=0.125 2023-11-20 08:00:06,631 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.51 vs. limit=22.5 2023-11-20 08:00:07,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1004686.6666666666, ans=0.2 2023-11-20 08:00:36,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1004886.6666666666, ans=0.0 2023-11-20 08:00:37,048 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 6450, loss[loss=0.09669, simple_loss=0.1337, pruned_loss=0.02441, audio_tagging_loss=0.005424, over 14779.00 frames. ], tot_loss[loss=0.08087, simple_loss=0.1007, pruned_loss=0.02012, audio_tagging_loss=0.01042, over 3022753.59 frames. ], batch size: 54, lr: 5.31e-03, grad_scale: 32.0 2023-11-20 08:00:53,820 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.70 vs. limit=22.5 2023-11-20 08:00:54,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1004953.3333333334, ans=0.2 2023-11-20 08:00:55,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1004953.3333333334, ans=0.125 2023-11-20 08:00:56,810 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 150750 2023-11-20 08:01:06,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1005020.0, ans=0.1 2023-11-20 08:01:06,605 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.64 vs. limit=22.5 2023-11-20 08:01:07,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=15.0 2023-11-20 08:01:14,330 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.40 vs. limit=10.0 2023-11-20 08:01:14,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1005086.6666666666, ans=0.125 2023-11-20 08:01:42,371 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 6500, loss[loss=0.07381, simple_loss=0.09052, pruned_loss=0.01945, audio_tagging_loss=0.009104, over 14528.00 frames. ], tot_loss[loss=0.08074, simple_loss=0.1007, pruned_loss=0.02003, audio_tagging_loss=0.01035, over 3020745.56 frames. ], batch size: 55, lr: 5.31e-03, grad_scale: 32.0 2023-11-20 08:02:01,561 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 150800 2023-11-20 08:02:05,446 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.846e+01 8.112e+01 8.845e+01 9.450e+01 1.236e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-20 08:02:25,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1005420.0, ans=0.125 2023-11-20 08:02:47,612 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 6550, loss[loss=0.1054, simple_loss=0.1363, pruned_loss=0.02671, audio_tagging_loss=0.01049, over 15449.00 frames. ], tot_loss[loss=0.0809, simple_loss=0.1012, pruned_loss=0.02009, audio_tagging_loss=0.01021, over 3030254.21 frames. ], batch size: 55, lr: 5.31e-03, grad_scale: 32.0 2023-11-20 08:02:49,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1005553.3333333334, ans=0.1 2023-11-20 08:03:06,835 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 150850 2023-11-20 08:03:11,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1005620.0, ans=0.1 2023-11-20 08:03:13,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1005686.6666666666, ans=0.2 2023-11-20 08:03:21,704 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.588e-03 2023-11-20 08:03:23,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1005686.6666666666, ans=0.0 2023-11-20 08:03:27,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1005753.3333333334, ans=0.125 2023-11-20 08:03:31,513 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.03 vs. limit=15.0 2023-11-20 08:03:34,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1005753.3333333334, ans=0.2 2023-11-20 08:03:47,115 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.26 vs. limit=15.0 2023-11-20 08:03:51,428 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 6600, loss[loss=0.07791, simple_loss=0.09745, pruned_loss=0.02127, audio_tagging_loss=0.007909, over 14239.00 frames. ], tot_loss[loss=0.08055, simple_loss=0.101, pruned_loss=0.01993, audio_tagging_loss=0.01013, over 3034114.53 frames. ], batch size: 56, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 08:03:51,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1005886.6666666666, ans=0.0 2023-11-20 08:04:12,222 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 150900 2023-11-20 08:04:12,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1005953.3333333334, ans=0.125 2023-11-20 08:04:16,933 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.805e+01 8.131e+01 8.865e+01 9.580e+01 1.182e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-20 08:04:23,121 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.26 vs. limit=22.5 2023-11-20 08:04:35,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1006086.6666666666, ans=0.125 2023-11-20 08:04:52,017 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=15.0 2023-11-20 08:04:57,528 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 6650, loss[loss=0.06583, simple_loss=0.08394, pruned_loss=0.01241, audio_tagging_loss=0.01145, over 16527.00 frames. ], tot_loss[loss=0.08109, simple_loss=0.1017, pruned_loss=0.02022, audio_tagging_loss=0.01005, over 3039180.64 frames. ], batch size: 62, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:05:05,517 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2023-11-20 08:05:16,676 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 150950 2023-11-20 08:05:38,866 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2023-11-20 08:05:41,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1006420.0, ans=10.0 2023-11-20 08:05:50,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1006486.6666666666, ans=0.125 2023-11-20 08:05:51,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1006486.6666666666, ans=0.125 2023-11-20 08:05:54,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1006486.6666666666, ans=0.125 2023-11-20 08:06:01,394 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 6700, loss[loss=0.07822, simple_loss=0.1039, pruned_loss=0.01831, audio_tagging_loss=0.007973, over 15146.00 frames. ], tot_loss[loss=0.08092, simple_loss=0.1016, pruned_loss=0.02015, audio_tagging_loss=0.009954, over 3041885.60 frames. ], batch size: 55, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:06:10,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1006553.3333333334, ans=0.1 2023-11-20 08:06:10,909 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.14 vs. limit=22.5 2023-11-20 08:06:14,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1006620.0, ans=0.95 2023-11-20 08:06:15,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1006620.0, ans=0.1 2023-11-20 08:06:16,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1006620.0, ans=0.0 2023-11-20 08:06:17,940 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-11-20 08:06:19,934 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 151000 2023-11-20 08:06:21,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1006620.0, ans=15.0 2023-11-20 08:06:23,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1006620.0, ans=0.125 2023-11-20 08:06:25,049 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.540e+01 7.744e+01 8.706e+01 9.471e+01 1.164e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 08:06:45,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1006753.3333333334, ans=0.125 2023-11-20 08:07:02,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1006820.0, ans=0.2 2023-11-20 08:07:05,431 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 6750, loss[loss=0.07147, simple_loss=0.08729, pruned_loss=0.01539, audio_tagging_loss=0.01244, over 15364.00 frames. ], tot_loss[loss=0.08086, simple_loss=0.1016, pruned_loss=0.02012, audio_tagging_loss=0.009937, over 3035593.92 frames. ], batch size: 57, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:07:25,126 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 151050 2023-11-20 08:07:26,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1006953.3333333334, ans=0.07 2023-11-20 08:08:09,646 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.15 vs. limit=22.5 2023-11-20 08:08:10,114 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 6800, loss[loss=0.07779, simple_loss=0.09694, pruned_loss=0.02084, audio_tagging_loss=0.008477, over 15032.00 frames. ], tot_loss[loss=0.08094, simple_loss=0.1016, pruned_loss=0.02022, audio_tagging_loss=0.009914, over 3042380.10 frames. ], batch size: 55, lr: 5.30e-03, grad_scale: 32.0 2023-11-20 08:08:22,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1007286.6666666666, ans=0.125 2023-11-20 08:08:26,109 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.83 vs. limit=22.5 2023-11-20 08:08:29,251 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 151100 2023-11-20 08:08:31,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1007286.6666666666, ans=0.0 2023-11-20 08:08:33,972 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.708e+01 8.041e+01 8.769e+01 9.846e+01 1.398e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 08:08:52,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.42 vs. limit=15.0 2023-11-20 08:08:55,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1007420.0, ans=0.1 2023-11-20 08:09:02,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1007486.6666666666, ans=0.125 2023-11-20 08:09:13,908 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 6850, loss[loss=0.07116, simple_loss=0.08643, pruned_loss=0.01422, audio_tagging_loss=0.01373, over 15961.00 frames. ], tot_loss[loss=0.08047, simple_loss=0.101, pruned_loss=0.02, audio_tagging_loss=0.009986, over 3040858.35 frames. ], batch size: 60, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:09:15,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1007553.3333333334, ans=0.125 2023-11-20 08:09:32,370 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 151150 2023-11-20 08:09:51,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1007753.3333333334, ans=0.0 2023-11-20 08:10:05,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1007820.0, ans=0.125 2023-11-20 08:10:10,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1007820.0, ans=0.04949747468305833 2023-11-20 08:10:17,711 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 6900, loss[loss=0.1034, simple_loss=0.1314, pruned_loss=0.02894, audio_tagging_loss=0.008753, over 14094.00 frames. ], tot_loss[loss=0.08013, simple_loss=0.1005, pruned_loss=0.01988, audio_tagging_loss=0.009991, over 3040407.26 frames. ], batch size: 52, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:10:27,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1007886.6666666666, ans=0.2 2023-11-20 08:10:37,652 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 151200 2023-11-20 08:10:44,626 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.370e+01 9.169e+01 1.032e+02 1.396e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-20 08:11:08,944 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 08:11:12,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1008153.3333333334, ans=0.125 2023-11-20 08:11:14,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1008153.3333333334, ans=0.1 2023-11-20 08:11:22,959 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 6950, loss[loss=0.08006, simple_loss=0.09583, pruned_loss=0.02095, audio_tagging_loss=0.01119, over 14663.00 frames. ], tot_loss[loss=0.08041, simple_loss=0.1008, pruned_loss=0.02001, audio_tagging_loss=0.009975, over 3033991.70 frames. ], batch size: 56, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:11:42,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1008286.6666666666, ans=0.1 2023-11-20 08:11:43,509 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 151250 2023-11-20 08:11:48,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1008353.3333333334, ans=0.0 2023-11-20 08:11:49,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1008353.3333333334, ans=0.125 2023-11-20 08:11:56,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1008353.3333333334, ans=0.2 2023-11-20 08:12:01,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1008420.0, ans=0.1 2023-11-20 08:12:06,941 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=15.0 2023-11-20 08:12:27,856 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 7000, loss[loss=0.09136, simple_loss=0.1132, pruned_loss=0.02568, audio_tagging_loss=0.009083, over 16255.00 frames. ], tot_loss[loss=0.08008, simple_loss=0.1005, pruned_loss=0.01984, audio_tagging_loss=0.009977, over 3040422.24 frames. ], batch size: 63, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:12:34,062 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.72 vs. limit=15.0 2023-11-20 08:12:46,739 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 151300 2023-11-20 08:12:52,723 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.717e+01 8.209e+01 8.895e+01 9.636e+01 1.347e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 08:13:32,216 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 7050, loss[loss=0.07161, simple_loss=0.08672, pruned_loss=0.01594, audio_tagging_loss=0.01231, over 15631.00 frames. ], tot_loss[loss=0.07945, simple_loss=0.0996, pruned_loss=0.01963, audio_tagging_loss=0.01002, over 3033547.77 frames. ], batch size: 60, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:13:51,860 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 151350 2023-11-20 08:13:54,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1008953.3333333334, ans=0.125 2023-11-20 08:14:02,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1009020.0, ans=0.125 2023-11-20 08:14:03,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1009020.0, ans=0.0 2023-11-20 08:14:06,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2023-11-20 08:14:11,334 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.32 vs. limit=15.0 2023-11-20 08:14:36,078 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 7100, loss[loss=0.08146, simple_loss=0.09883, pruned_loss=0.02104, audio_tagging_loss=0.011, over 14532.00 frames. ], tot_loss[loss=0.0789, simple_loss=0.09873, pruned_loss=0.01943, audio_tagging_loss=0.01011, over 3040844.78 frames. ], batch size: 54, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:14:55,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1009286.6666666666, ans=0.0 2023-11-20 08:14:56,636 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 151400 2023-11-20 08:15:00,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1009286.6666666666, ans=0.125 2023-11-20 08:15:03,705 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.148e+01 8.789e+01 9.448e+01 1.213e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-20 08:15:17,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1009420.0, ans=0.125 2023-11-20 08:15:23,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1009420.0, ans=0.0 2023-11-20 08:15:26,327 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2023-11-20 08:15:33,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1009486.6666666666, ans=0.0 2023-11-20 08:15:38,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1009486.6666666666, ans=0.05 2023-11-20 08:15:41,844 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 7150, loss[loss=0.05824, simple_loss=0.05974, pruned_loss=0.01142, audio_tagging_loss=0.01695, over 15375.00 frames. ], tot_loss[loss=0.07953, simple_loss=0.09953, pruned_loss=0.01963, audio_tagging_loss=0.01014, over 3037338.32 frames. ], batch size: 60, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:15:47,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1009553.3333333334, ans=0.125 2023-11-20 08:15:54,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1009620.0, ans=0.0 2023-11-20 08:15:57,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1009620.0, ans=0.125 2023-11-20 08:15:58,765 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.18 vs. limit=6.0 2023-11-20 08:16:01,453 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.00 vs. limit=15.0 2023-11-20 08:16:01,879 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 151450 2023-11-20 08:16:05,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1009620.0, ans=0.0 2023-11-20 08:16:32,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1009820.0, ans=0.125 2023-11-20 08:16:45,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1009820.0, ans=0.1 2023-11-20 08:16:47,321 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 7200, loss[loss=0.09659, simple_loss=0.1292, pruned_loss=0.02418, audio_tagging_loss=0.007797, over 15242.00 frames. ], tot_loss[loss=0.07994, simple_loss=0.1001, pruned_loss=0.01966, audio_tagging_loss=0.01025, over 3038586.31 frames. ], batch size: 54, lr: 5.30e-03, grad_scale: 32.0 2023-11-20 08:17:06,108 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 151500 2023-11-20 08:17:08,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1009953.3333333334, ans=0.125 2023-11-20 08:17:13,441 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.859e+01 8.360e+01 9.161e+01 1.018e+02 1.279e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-20 08:17:22,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1010020.0, ans=0.125 2023-11-20 08:17:29,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1010086.6666666666, ans=0.09899494936611666 2023-11-20 08:17:35,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1010086.6666666666, ans=0.1 2023-11-20 08:17:42,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1010153.3333333334, ans=0.2 2023-11-20 08:17:50,978 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 7250, loss[loss=0.06767, simple_loss=0.08401, pruned_loss=0.01342, audio_tagging_loss=0.01225, over 13578.00 frames. ], tot_loss[loss=0.08011, simple_loss=0.1002, pruned_loss=0.01961, audio_tagging_loss=0.0104, over 3034074.37 frames. ], batch size: 54, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:18:10,778 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 151550 2023-11-20 08:18:28,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1010353.3333333334, ans=0.1 2023-11-20 08:18:55,936 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 7300, loss[loss=0.1023, simple_loss=0.1409, pruned_loss=0.0248, audio_tagging_loss=0.007031, over 15548.00 frames. ], tot_loss[loss=0.08053, simple_loss=0.1009, pruned_loss=0.01989, audio_tagging_loss=0.01022, over 3033918.31 frames. ], batch size: 55, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:18:58,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1010553.3333333334, ans=0.125 2023-11-20 08:19:16,005 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 151600 2023-11-20 08:19:22,270 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.295e+01 8.871e+01 9.520e+01 1.560e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 08:19:38,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1010753.3333333334, ans=0.1 2023-11-20 08:19:55,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1010820.0, ans=0.125 2023-11-20 08:19:56,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1010820.0, ans=0.1 2023-11-20 08:19:58,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1010820.0, ans=0.125 2023-11-20 08:20:01,269 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 7350, loss[loss=0.0608, simple_loss=0.08142, pruned_loss=0.008655, audio_tagging_loss=0.01143, over 15527.00 frames. ], tot_loss[loss=0.08101, simple_loss=0.1017, pruned_loss=0.02016, audio_tagging_loss=0.01001, over 3034381.53 frames. ], batch size: 56, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:20:02,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1010886.6666666666, ans=0.0 2023-11-20 08:20:10,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1010886.6666666666, ans=0.125 2023-11-20 08:20:18,089 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=22.5 2023-11-20 08:20:19,876 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 151650 2023-11-20 08:20:26,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1011020.0, ans=0.5 2023-11-20 08:20:30,106 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.94 vs. limit=15.0 2023-11-20 08:21:05,198 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 7400, loss[loss=0.06173, simple_loss=0.0725, pruned_loss=0.01415, audio_tagging_loss=0.01133, over 14699.00 frames. ], tot_loss[loss=0.08118, simple_loss=0.1019, pruned_loss=0.02033, audio_tagging_loss=0.009899, over 3034751.44 frames. ], batch size: 55, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:21:05,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.41 vs. limit=10.0 2023-11-20 08:21:10,866 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2023-11-20 08:21:15,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1011220.0, ans=0.0 2023-11-20 08:21:25,016 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 151700 2023-11-20 08:21:25,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1011286.6666666666, ans=0.125 2023-11-20 08:21:30,867 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.761e+01 8.067e+01 8.750e+01 9.547e+01 1.451e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 08:21:37,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1011353.3333333334, ans=0.0 2023-11-20 08:21:47,484 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.59 vs. limit=15.0 2023-11-20 08:21:49,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1011420.0, ans=0.95 2023-11-20 08:21:57,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1011486.6666666666, ans=10.0 2023-11-20 08:22:08,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1011553.3333333334, ans=0.125 2023-11-20 08:22:09,939 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 7450, loss[loss=0.07248, simple_loss=0.08379, pruned_loss=0.01693, audio_tagging_loss=0.01366, over 14542.00 frames. ], tot_loss[loss=0.08094, simple_loss=0.1016, pruned_loss=0.02028, audio_tagging_loss=0.009867, over 3037724.76 frames. ], batch size: 54, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:22:12,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1011553.3333333334, ans=0.125 2023-11-20 08:22:29,157 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 151750 2023-11-20 08:22:42,546 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.46 vs. limit=6.0 2023-11-20 08:22:54,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1011753.3333333334, ans=0.0 2023-11-20 08:23:14,419 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 7500, loss[loss=0.1142, simple_loss=0.1428, pruned_loss=0.03503, audio_tagging_loss=0.007818, over 16080.00 frames. ], tot_loss[loss=0.08064, simple_loss=0.1012, pruned_loss=0.02016, audio_tagging_loss=0.009867, over 3041433.51 frames. ], batch size: 57, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:23:18,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1011886.6666666666, ans=0.125 2023-11-20 08:23:23,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1011886.6666666666, ans=0.2 2023-11-20 08:23:33,566 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 151800 2023-11-20 08:23:40,383 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.995e+01 8.501e+01 9.262e+01 1.002e+02 1.307e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-20 08:23:53,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.58 vs. limit=15.0 2023-11-20 08:24:19,040 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 7550, loss[loss=0.08661, simple_loss=0.1127, pruned_loss=0.02266, audio_tagging_loss=0.007589, over 15469.00 frames. ], tot_loss[loss=0.08033, simple_loss=0.1008, pruned_loss=0.02003, audio_tagging_loss=0.009914, over 3046621.72 frames. ], batch size: 58, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:24:37,027 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=22.5 2023-11-20 08:24:38,755 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 151850 2023-11-20 08:24:41,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1012286.6666666666, ans=0.0 2023-11-20 08:24:42,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1012286.6666666666, ans=0.07 2023-11-20 08:24:55,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1012353.3333333334, ans=0.025 2023-11-20 08:25:04,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1012420.0, ans=0.125 2023-11-20 08:25:18,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1012486.6666666666, ans=0.1 2023-11-20 08:25:23,874 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 7600, loss[loss=0.05436, simple_loss=0.06509, pruned_loss=0.01042, audio_tagging_loss=0.01139, over 14634.00 frames. ], tot_loss[loss=0.07966, simple_loss=0.09983, pruned_loss=0.01981, audio_tagging_loss=0.009938, over 3045294.99 frames. ], batch size: 58, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:25:30,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1012553.3333333334, ans=0.125 2023-11-20 08:25:31,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1012553.3333333334, ans=0.2 2023-11-20 08:25:42,986 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 151900 2023-11-20 08:25:48,985 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.336e+01 8.168e+01 8.744e+01 9.399e+01 1.403e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-20 08:25:51,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1012686.6666666666, ans=0.125 2023-11-20 08:26:02,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1012753.3333333334, ans=0.125 2023-11-20 08:26:11,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1012753.3333333334, ans=0.1 2023-11-20 08:26:13,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1012753.3333333334, ans=0.0 2023-11-20 08:26:24,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1012820.0, ans=0.125 2023-11-20 08:26:28,798 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 7650, loss[loss=0.08061, simple_loss=0.1096, pruned_loss=0.01813, audio_tagging_loss=0.007661, over 15437.00 frames. ], tot_loss[loss=0.07875, simple_loss=0.09876, pruned_loss=0.01945, audio_tagging_loss=0.00992, over 3042638.35 frames. ], batch size: 57, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:26:48,147 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 151950 2023-11-20 08:26:51,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1012953.3333333334, ans=0.125 2023-11-20 08:27:24,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1013153.3333333334, ans=0.05 2023-11-20 08:27:29,019 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 2023-11-20 08:27:33,344 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 7700, loss[loss=0.07302, simple_loss=0.08646, pruned_loss=0.01722, audio_tagging_loss=0.01257, over 16194.00 frames. ], tot_loss[loss=0.0787, simple_loss=0.09866, pruned_loss=0.01934, audio_tagging_loss=0.01003, over 3042414.05 frames. ], batch size: 62, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:27:53,672 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 152000 2023-11-20 08:27:53,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1013286.6666666666, ans=0.0 2023-11-20 08:28:03,800 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.653e+01 7.881e+01 8.536e+01 9.329e+01 1.282e+02, threshold=1.707e+02, percent-clipped=0.0 2023-11-20 08:28:04,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1013353.3333333334, ans=0.0 2023-11-20 08:28:12,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1013353.3333333334, ans=0.2 2023-11-20 08:28:22,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1013420.0, ans=0.1 2023-11-20 08:28:42,940 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 7750, loss[loss=0.08406, simple_loss=0.09672, pruned_loss=0.02328, audio_tagging_loss=0.01242, over 15681.00 frames. ], tot_loss[loss=0.0791, simple_loss=0.09914, pruned_loss=0.0195, audio_tagging_loss=0.01004, over 3044928.77 frames. ], batch size: 59, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:28:43,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1013553.3333333334, ans=0.125 2023-11-20 08:28:43,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1013553.3333333334, ans=0.2 2023-11-20 08:28:44,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1013553.3333333334, ans=0.125 2023-11-20 08:28:47,362 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.16 vs. limit=10.0 2023-11-20 08:28:50,885 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.95 vs. limit=15.0 2023-11-20 08:28:52,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1013553.3333333334, ans=0.1 2023-11-20 08:28:56,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1013620.0, ans=0.2 2023-11-20 08:28:59,984 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=15.0 2023-11-20 08:29:01,693 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 152050 2023-11-20 08:29:21,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1013753.3333333334, ans=0.125 2023-11-20 08:29:23,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1013753.3333333334, ans=0.2 2023-11-20 08:29:24,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1013753.3333333334, ans=0.0 2023-11-20 08:29:35,499 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.261e-01 2023-11-20 08:29:39,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=22.5 2023-11-20 08:29:40,565 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:29:46,233 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 7800, loss[loss=0.08398, simple_loss=0.1083, pruned_loss=0.02059, audio_tagging_loss=0.009256, over 15568.00 frames. ], tot_loss[loss=0.08047, simple_loss=0.1011, pruned_loss=0.01993, audio_tagging_loss=0.009985, over 3049410.87 frames. ], batch size: 57, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:29:50,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1013886.6666666666, ans=0.1 2023-11-20 08:30:03,284 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.05 vs. limit=15.0 2023-11-20 08:30:05,401 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 152100 2023-11-20 08:30:12,165 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.723e+01 8.195e+01 8.711e+01 9.199e+01 1.205e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 08:30:24,088 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.63 vs. limit=15.0 2023-11-20 08:30:51,133 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 7850, loss[loss=0.07454, simple_loss=0.08671, pruned_loss=0.01908, audio_tagging_loss=0.01211, over 15086.00 frames. ], tot_loss[loss=0.08074, simple_loss=0.1012, pruned_loss=0.02005, audio_tagging_loss=0.0101, over 3051076.66 frames. ], batch size: 58, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:31:04,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1014286.6666666666, ans=0.125 2023-11-20 08:31:11,662 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 152150 2023-11-20 08:31:19,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1014353.3333333334, ans=0.125 2023-11-20 08:31:19,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1014353.3333333334, ans=0.125 2023-11-20 08:31:24,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1014353.3333333334, ans=0.0 2023-11-20 08:31:37,816 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=12.0 2023-11-20 08:31:51,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1014486.6666666666, ans=0.0 2023-11-20 08:31:51,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1014486.6666666666, ans=0.2 2023-11-20 08:31:56,455 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 7900, loss[loss=0.06893, simple_loss=0.08264, pruned_loss=0.01609, audio_tagging_loss=0.01152, over 15131.00 frames. ], tot_loss[loss=0.08057, simple_loss=0.1006, pruned_loss=0.01999, audio_tagging_loss=0.01025, over 3049108.17 frames. ], batch size: 58, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:32:10,073 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.97 vs. limit=15.0 2023-11-20 08:32:11,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1014620.0, ans=0.0 2023-11-20 08:32:15,264 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 152200 2023-11-20 08:32:22,233 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.716e+01 8.112e+01 8.963e+01 9.750e+01 1.229e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-20 08:32:34,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1014753.3333333334, ans=0.125 2023-11-20 08:32:53,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1014820.0, ans=0.1 2023-11-20 08:33:00,853 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 7950, loss[loss=0.04608, simple_loss=0.05061, pruned_loss=0.007081, audio_tagging_loss=0.01369, over 14999.00 frames. ], tot_loss[loss=0.08034, simple_loss=0.1002, pruned_loss=0.01985, audio_tagging_loss=0.01038, over 3046947.31 frames. ], batch size: 60, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:33:17,557 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 08:33:20,012 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 152250 2023-11-20 08:33:26,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1015020.0, ans=0.125 2023-11-20 08:33:34,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1015020.0, ans=0.125 2023-11-20 08:33:37,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1015020.0, ans=0.07 2023-11-20 08:33:47,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1015086.6666666666, ans=0.0 2023-11-20 08:33:48,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1015086.6666666666, ans=0.125 2023-11-20 08:33:51,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1015153.3333333334, ans=0.125 2023-11-20 08:34:04,854 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 8000, loss[loss=0.1119, simple_loss=0.1494, pruned_loss=0.02953, audio_tagging_loss=0.007735, over 15303.00 frames. ], tot_loss[loss=0.08004, simple_loss=0.09948, pruned_loss=0.01981, audio_tagging_loss=0.01049, over 3039846.74 frames. ], batch size: 55, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:34:15,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1015220.0, ans=0.1 2023-11-20 08:34:24,109 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 152300 2023-11-20 08:34:28,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.18 vs. limit=15.0 2023-11-20 08:34:30,846 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.971e+01 8.267e+01 8.804e+01 9.546e+01 1.251e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-20 08:34:44,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1015420.0, ans=0.1 2023-11-20 08:35:02,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1015486.6666666666, ans=0.025 2023-11-20 08:35:08,908 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 8050, loss[loss=0.06763, simple_loss=0.07874, pruned_loss=0.01637, audio_tagging_loss=0.01189, over 16380.00 frames. ], tot_loss[loss=0.08062, simple_loss=0.1003, pruned_loss=0.02003, audio_tagging_loss=0.01044, over 3037499.39 frames. ], batch size: 64, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:35:29,140 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 152350 2023-11-20 08:35:30,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1015620.0, ans=0.0 2023-11-20 08:35:53,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1015753.3333333334, ans=0.0 2023-11-20 08:35:53,594 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.02 vs. limit=22.5 2023-11-20 08:35:59,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1015820.0, ans=0.1 2023-11-20 08:36:01,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1015820.0, ans=0.125 2023-11-20 08:36:14,489 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 8100, loss[loss=0.08816, simple_loss=0.1118, pruned_loss=0.02309, audio_tagging_loss=0.009158, over 15166.00 frames. ], tot_loss[loss=0.08069, simple_loss=0.1006, pruned_loss=0.02002, audio_tagging_loss=0.01036, over 3038708.85 frames. ], batch size: 56, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:36:17,595 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.57 vs. limit=22.5 2023-11-20 08:36:33,604 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 152400 2023-11-20 08:36:34,164 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.09 vs. limit=12.0 2023-11-20 08:36:39,867 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.032e+01 8.730e+01 9.665e+01 1.175e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-20 08:36:51,186 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.97 vs. limit=22.5 2023-11-20 08:37:05,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1016153.3333333334, ans=0.125 2023-11-20 08:37:16,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1016153.3333333334, ans=0.2 2023-11-20 08:37:18,482 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 8150, loss[loss=0.07766, simple_loss=0.1028, pruned_loss=0.01793, audio_tagging_loss=0.008307, over 15626.00 frames. ], tot_loss[loss=0.08079, simple_loss=0.101, pruned_loss=0.0201, audio_tagging_loss=0.0102, over 3040277.11 frames. ], batch size: 57, lr: 5.28e-03, grad_scale: 16.0 2023-11-20 08:37:22,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1016220.0, ans=0.1 2023-11-20 08:37:30,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1016286.6666666666, ans=0.125 2023-11-20 08:37:36,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1016286.6666666666, ans=0.05 2023-11-20 08:37:37,880 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 152450 2023-11-20 08:37:39,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1016286.6666666666, ans=0.125 2023-11-20 08:37:39,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1016286.6666666666, ans=0.0 2023-11-20 08:37:43,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1016353.3333333334, ans=0.0 2023-11-20 08:37:45,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1016353.3333333334, ans=0.0 2023-11-20 08:38:00,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1016420.0, ans=0.125 2023-11-20 08:38:05,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1016420.0, ans=15.0 2023-11-20 08:38:15,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1016486.6666666666, ans=0.1 2023-11-20 08:38:17,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1016486.6666666666, ans=0.125 2023-11-20 08:38:22,215 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 8200, loss[loss=0.07592, simple_loss=0.0931, pruned_loss=0.01821, audio_tagging_loss=0.01116, over 15542.00 frames. ], tot_loss[loss=0.08056, simple_loss=0.101, pruned_loss=0.02002, audio_tagging_loss=0.01006, over 3038479.34 frames. ], batch size: 58, lr: 5.28e-03, grad_scale: 16.0 2023-11-20 08:38:23,449 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 08:38:36,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1016620.0, ans=0.125 2023-11-20 08:38:42,208 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 152500 2023-11-20 08:38:43,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1016620.0, ans=0.05 2023-11-20 08:38:49,930 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 7.999e+01 8.746e+01 9.539e+01 1.196e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-20 08:38:51,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1016686.6666666666, ans=0.1 2023-11-20 08:38:54,499 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.78 vs. limit=22.5 2023-11-20 08:39:04,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1016753.3333333334, ans=0.125 2023-11-20 08:39:06,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2023-11-20 08:39:07,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1016753.3333333334, ans=0.125 2023-11-20 08:39:11,739 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.64 vs. limit=8.0 2023-11-20 08:39:13,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1016820.0, ans=0.2 2023-11-20 08:39:26,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1016886.6666666666, ans=0.2 2023-11-20 08:39:27,143 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 8250, loss[loss=0.07806, simple_loss=0.09501, pruned_loss=0.01833, audio_tagging_loss=0.01223, over 14072.00 frames. ], tot_loss[loss=0.08075, simple_loss=0.1014, pruned_loss=0.0201, audio_tagging_loss=0.009939, over 3036203.78 frames. ], batch size: 54, lr: 5.28e-03, grad_scale: 16.0 2023-11-20 08:39:28,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1016886.6666666666, ans=0.07 2023-11-20 08:39:28,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1016886.6666666666, ans=0.0 2023-11-20 08:39:46,092 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 152550 2023-11-20 08:39:46,853 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.85 vs. limit=22.5 2023-11-20 08:40:03,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1017086.6666666666, ans=0.0 2023-11-20 08:40:23,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1017153.3333333334, ans=0.125 2023-11-20 08:40:29,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1017153.3333333334, ans=0.1 2023-11-20 08:40:31,466 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 8300, loss[loss=0.06959, simple_loss=0.08724, pruned_loss=0.01549, audio_tagging_loss=0.01048, over 16026.00 frames. ], tot_loss[loss=0.0801, simple_loss=0.1001, pruned_loss=0.02001, audio_tagging_loss=0.01003, over 3036859.42 frames. ], batch size: 59, lr: 5.28e-03, grad_scale: 16.0 2023-11-20 08:40:31,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1017220.0, ans=0.0 2023-11-20 08:40:49,661 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 152600 2023-11-20 08:40:58,002 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.507e+01 8.110e+01 8.922e+01 9.710e+01 1.456e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-20 08:41:02,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1017353.3333333334, ans=0.2 2023-11-20 08:41:06,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1017353.3333333334, ans=0.1 2023-11-20 08:41:11,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1017420.0, ans=0.1 2023-11-20 08:41:21,444 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.43 vs. limit=15.0 2023-11-20 08:41:35,106 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 8350, loss[loss=0.08777, simple_loss=0.1212, pruned_loss=0.02133, audio_tagging_loss=0.005848, over 16376.00 frames. ], tot_loss[loss=0.08045, simple_loss=0.1011, pruned_loss=0.02, audio_tagging_loss=0.009914, over 3052195.03 frames. ], batch size: 58, lr: 5.28e-03, grad_scale: 16.0 2023-11-20 08:41:37,209 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.77 vs. limit=15.0 2023-11-20 08:41:54,065 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.73 vs. limit=22.5 2023-11-20 08:41:54,607 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 152650 2023-11-20 08:41:56,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.63 vs. limit=22.5 2023-11-20 08:41:57,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1017620.0, ans=0.125 2023-11-20 08:42:22,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1017753.3333333334, ans=0.1 2023-11-20 08:42:32,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1017820.0, ans=0.2 2023-11-20 08:42:35,894 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:42:38,153 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.55 vs. limit=10.0 2023-11-20 08:42:39,806 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 8400, loss[loss=0.07981, simple_loss=0.09966, pruned_loss=0.0172, audio_tagging_loss=0.01278, over 15079.00 frames. ], tot_loss[loss=0.08032, simple_loss=0.1008, pruned_loss=0.01992, audio_tagging_loss=0.01001, over 3047555.43 frames. ], batch size: 56, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:42:55,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1017953.3333333334, ans=0.2 2023-11-20 08:42:59,241 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 152700 2023-11-20 08:43:06,588 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.274e+01 8.016e+01 8.880e+01 9.653e+01 1.299e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-20 08:43:09,788 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.29 vs. limit=15.0 2023-11-20 08:43:13,713 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.15 vs. limit=15.0 2023-11-20 08:43:16,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1018086.6666666666, ans=0.05 2023-11-20 08:43:44,752 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 8450, loss[loss=0.06624, simple_loss=0.08034, pruned_loss=0.0158, audio_tagging_loss=0.01027, over 16110.00 frames. ], tot_loss[loss=0.08031, simple_loss=0.1006, pruned_loss=0.01995, audio_tagging_loss=0.01005, over 3054215.89 frames. ], batch size: 61, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:44:03,247 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 152750 2023-11-20 08:44:03,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1018286.6666666666, ans=0.125 2023-11-20 08:44:27,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1018420.0, ans=0.125 2023-11-20 08:44:33,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1018420.0, ans=0.0 2023-11-20 08:44:43,759 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=22.5 2023-11-20 08:44:47,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1018553.3333333334, ans=15.0 2023-11-20 08:44:48,046 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 8500, loss[loss=0.104, simple_loss=0.1338, pruned_loss=0.02939, audio_tagging_loss=0.007666, over 15725.00 frames. ], tot_loss[loss=0.08047, simple_loss=0.1009, pruned_loss=0.01997, audio_tagging_loss=0.01005, over 3053081.29 frames. ], batch size: 56, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:44:53,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1018553.3333333334, ans=0.1 2023-11-20 08:45:07,850 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 152800 2023-11-20 08:45:08,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1018620.0, ans=0.2 2023-11-20 08:45:11,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1018620.0, ans=0.035 2023-11-20 08:45:15,429 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.959e+01 8.066e+01 8.928e+01 9.740e+01 1.439e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-20 08:45:47,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1018820.0, ans=0.125 2023-11-20 08:45:53,043 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 8550, loss[loss=0.09243, simple_loss=0.119, pruned_loss=0.02396, audio_tagging_loss=0.008977, over 14654.00 frames. ], tot_loss[loss=0.08027, simple_loss=0.1006, pruned_loss=0.01982, audio_tagging_loss=0.01015, over 3055199.83 frames. ], batch size: 54, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:45:55,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1018886.6666666666, ans=0.0 2023-11-20 08:45:59,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1018886.6666666666, ans=0.07 2023-11-20 08:46:06,087 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.70 vs. limit=10.0 2023-11-20 08:46:12,923 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 152850 2023-11-20 08:46:24,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1019020.0, ans=0.2 2023-11-20 08:46:26,092 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.42 vs. limit=10.0 2023-11-20 08:46:57,774 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 8600, loss[loss=0.07907, simple_loss=0.09983, pruned_loss=0.01994, audio_tagging_loss=0.009219, over 15276.00 frames. ], tot_loss[loss=0.08029, simple_loss=0.1007, pruned_loss=0.01985, audio_tagging_loss=0.01009, over 3049099.30 frames. ], batch size: 58, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:47:06,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1019220.0, ans=0.025 2023-11-20 08:47:16,751 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 152900 2023-11-20 08:47:18,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1019286.6666666666, ans=0.0 2023-11-20 08:47:24,683 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.691e+01 7.921e+01 8.657e+01 9.489e+01 1.169e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-20 08:47:26,792 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.72 vs. limit=15.0 2023-11-20 08:47:37,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1019420.0, ans=0.2 2023-11-20 08:47:48,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1019486.6666666666, ans=10.0 2023-11-20 08:48:02,345 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 8650, loss[loss=0.0989, simple_loss=0.1242, pruned_loss=0.02636, audio_tagging_loss=0.01042, over 15032.00 frames. ], tot_loss[loss=0.0807, simple_loss=0.1012, pruned_loss=0.02002, audio_tagging_loss=0.01006, over 3047887.95 frames. ], batch size: 56, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:48:14,404 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:48:14,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1019620.0, ans=0.125 2023-11-20 08:48:18,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1019620.0, ans=0.0 2023-11-20 08:48:22,137 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 152950 2023-11-20 08:48:25,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1019620.0, ans=0.125 2023-11-20 08:49:01,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1019820.0, ans=0.125 2023-11-20 08:49:06,035 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 8700, loss[loss=0.09547, simple_loss=0.1101, pruned_loss=0.03076, audio_tagging_loss=0.009674, over 14406.00 frames. ], tot_loss[loss=0.08073, simple_loss=0.1011, pruned_loss=0.02006, audio_tagging_loss=0.0101, over 3053767.70 frames. ], batch size: 54, lr: 5.27e-03, grad_scale: 16.0 2023-11-20 08:49:17,638 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.36 vs. limit=15.0 2023-11-20 08:49:25,914 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 153000 2023-11-20 08:49:35,425 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.857e+01 8.289e+01 9.010e+01 9.707e+01 1.388e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-20 08:49:41,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1020020.0, ans=0.125 2023-11-20 08:50:02,181 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2023-11-20 08:50:04,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1020153.3333333334, ans=0.125 2023-11-20 08:50:09,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1020153.3333333334, ans=0.2 2023-11-20 08:50:10,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1020220.0, ans=0.015 2023-11-20 08:50:11,230 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 8750, loss[loss=0.07341, simple_loss=0.08976, pruned_loss=0.01887, audio_tagging_loss=0.009653, over 15182.00 frames. ], tot_loss[loss=0.08051, simple_loss=0.1007, pruned_loss=0.01996, audio_tagging_loss=0.01017, over 3053231.51 frames. ], batch size: 56, lr: 5.27e-03, grad_scale: 16.0 2023-11-20 08:50:19,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1020220.0, ans=0.125 2023-11-20 08:50:28,941 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.99 vs. limit=10.0 2023-11-20 08:50:30,884 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 153050 2023-11-20 08:50:38,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1020353.3333333334, ans=0.0 2023-11-20 08:51:09,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1020486.6666666666, ans=0.125 2023-11-20 08:51:10,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1020486.6666666666, ans=0.125 2023-11-20 08:51:11,520 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.24 vs. limit=15.0 2023-11-20 08:51:16,280 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 8800, loss[loss=0.08552, simple_loss=0.1127, pruned_loss=0.02035, audio_tagging_loss=0.008804, over 15246.00 frames. ], tot_loss[loss=0.08124, simple_loss=0.1017, pruned_loss=0.02023, audio_tagging_loss=0.01018, over 3054092.75 frames. ], batch size: 54, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:51:27,940 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2023-11-20 08:51:35,304 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 153100 2023-11-20 08:51:37,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1020620.0, ans=0.125 2023-11-20 08:51:44,286 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.487e+01 8.394e+01 9.116e+01 1.011e+02 1.329e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-20 08:51:45,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1020686.6666666666, ans=0.07 2023-11-20 08:52:18,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1020820.0, ans=0.05 2023-11-20 08:52:20,883 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 8850, loss[loss=0.07135, simple_loss=0.08844, pruned_loss=0.01792, audio_tagging_loss=0.009207, over 14802.00 frames. ], tot_loss[loss=0.08166, simple_loss=0.1023, pruned_loss=0.02032, audio_tagging_loss=0.01019, over 3055702.65 frames. ], batch size: 55, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:52:33,725 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 08:52:40,696 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 153150 2023-11-20 08:52:42,077 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:53:18,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1021153.3333333334, ans=0.2 2023-11-20 08:53:26,060 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 8900, loss[loss=0.108, simple_loss=0.1338, pruned_loss=0.03169, audio_tagging_loss=0.009456, over 16376.00 frames. ], tot_loss[loss=0.08102, simple_loss=0.1017, pruned_loss=0.02002, audio_tagging_loss=0.01013, over 3057842.84 frames. ], batch size: 60, lr: 5.27e-03, grad_scale: 16.0 2023-11-20 08:53:28,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1021220.0, ans=0.125 2023-11-20 08:53:31,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1021220.0, ans=0.1 2023-11-20 08:53:45,878 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 153200 2023-11-20 08:53:49,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1021286.6666666666, ans=0.125 2023-11-20 08:53:51,499 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:53:55,906 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.066e+01 8.130e+01 8.663e+01 9.532e+01 1.311e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 08:54:29,126 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.58 vs. limit=15.0 2023-11-20 08:54:31,106 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 8950, loss[loss=0.07149, simple_loss=0.09437, pruned_loss=0.01732, audio_tagging_loss=0.006978, over 15525.00 frames. ], tot_loss[loss=0.08001, simple_loss=0.1005, pruned_loss=0.01972, audio_tagging_loss=0.01004, over 3062325.63 frames. ], batch size: 59, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 08:54:39,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1021553.3333333334, ans=0.125 2023-11-20 08:54:50,583 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 153250 2023-11-20 08:54:59,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.34 vs. limit=15.0 2023-11-20 08:55:22,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1021820.0, ans=0.0 2023-11-20 08:55:36,033 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 9000, loss[loss=0.06216, simple_loss=0.0755, pruned_loss=0.01365, audio_tagging_loss=0.01076, over 14332.00 frames. ], tot_loss[loss=0.08031, simple_loss=0.1007, pruned_loss=0.0199, audio_tagging_loss=0.01008, over 3056488.61 frames. ], batch size: 55, lr: 5.26e-03, grad_scale: 8.0 2023-11-20 08:55:36,033 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-20 08:56:11,881 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1083, 4.1155, 4.3216, 4.3758], device='cuda:3') 2023-11-20 08:56:18,730 INFO [train_asr.py:1294] (3/4) Epoch 13, validation: loss=0.06245, simple_loss=0.0538, pruned_loss=0.005768, audio_tagging_loss=0.02978, over 4681554.00 frames. 2023-11-20 08:56:18,731 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-20 08:56:26,605 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.54 vs. limit=15.0 2023-11-20 08:56:36,994 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 153300 2023-11-20 08:56:48,914 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.462e+01 8.276e+01 8.856e+01 9.604e+01 3.298e+02, threshold=1.771e+02, percent-clipped=1.0 2023-11-20 08:56:53,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1022020.0, ans=0.125 2023-11-20 08:56:54,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1022020.0, ans=0.0 2023-11-20 08:57:04,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1022086.6666666666, ans=0.0 2023-11-20 08:57:10,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1022153.3333333334, ans=0.0 2023-11-20 08:57:12,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1022153.3333333334, ans=0.0 2023-11-20 08:57:12,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1022153.3333333334, ans=0.125 2023-11-20 08:57:16,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1022153.3333333334, ans=0.1 2023-11-20 08:57:18,076 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.93 vs. limit=22.5 2023-11-20 08:57:22,130 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 9050, loss[loss=0.06815, simple_loss=0.07954, pruned_loss=0.01624, audio_tagging_loss=0.01213, over 14690.00 frames. ], tot_loss[loss=0.08034, simple_loss=0.101, pruned_loss=0.01991, audio_tagging_loss=0.00994, over 3049802.44 frames. ], batch size: 56, lr: 5.26e-03, grad_scale: 8.0 2023-11-20 08:57:22,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=1022220.0, ans=0.05 2023-11-20 08:57:41,892 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 153350 2023-11-20 08:57:52,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1022353.3333333334, ans=0.1 2023-11-20 08:57:54,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.67 vs. limit=15.0 2023-11-20 08:58:06,021 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.99 vs. limit=15.0 2023-11-20 08:58:09,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.38 vs. limit=22.5 2023-11-20 08:58:19,400 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2023-11-20 08:58:22,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1022486.6666666666, ans=0.125 2023-11-20 08:58:26,692 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 9100, loss[loss=0.06645, simple_loss=0.07822, pruned_loss=0.01447, audio_tagging_loss=0.01287, over 15973.00 frames. ], tot_loss[loss=0.0798, simple_loss=0.1003, pruned_loss=0.01983, audio_tagging_loss=0.00984, over 3046260.41 frames. ], batch size: 63, lr: 5.26e-03, grad_scale: 8.0 2023-11-20 08:58:41,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1022620.0, ans=0.125 2023-11-20 08:58:46,281 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 153400 2023-11-20 08:58:57,454 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.670e+01 8.168e+01 8.794e+01 9.522e+01 1.542e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 08:59:09,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1022753.3333333334, ans=0.125 2023-11-20 08:59:31,262 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 9150, loss[loss=0.07633, simple_loss=0.1017, pruned_loss=0.01737, audio_tagging_loss=0.008136, over 16309.00 frames. ], tot_loss[loss=0.07927, simple_loss=0.09986, pruned_loss=0.01957, audio_tagging_loss=0.009765, over 3042727.83 frames. ], batch size: 61, lr: 5.26e-03, grad_scale: 8.0 2023-11-20 08:59:50,125 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 153450 2023-11-20 09:00:23,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1023153.3333333334, ans=0.125 2023-11-20 09:00:35,460 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 9200, loss[loss=0.08425, simple_loss=0.1058, pruned_loss=0.02244, audio_tagging_loss=0.008927, over 14376.00 frames. ], tot_loss[loss=0.07957, simple_loss=0.1003, pruned_loss=0.01971, audio_tagging_loss=0.009698, over 3039481.48 frames. ], batch size: 53, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:00:45,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1023220.0, ans=0.125 2023-11-20 09:00:54,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1023286.6666666666, ans=0.2 2023-11-20 09:00:55,648 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 153500 2023-11-20 09:01:07,209 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.430e+01 8.062e+01 8.603e+01 9.204e+01 1.228e+02, threshold=1.721e+02, percent-clipped=0.0 2023-11-20 09:01:14,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1023420.0, ans=0.2 2023-11-20 09:01:37,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1023486.6666666666, ans=0.0 2023-11-20 09:01:38,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1023486.6666666666, ans=0.2 2023-11-20 09:01:38,914 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.06 vs. limit=15.0 2023-11-20 09:01:40,797 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 9250, loss[loss=0.06884, simple_loss=0.08861, pruned_loss=0.0163, audio_tagging_loss=0.008234, over 16575.00 frames. ], tot_loss[loss=0.07968, simple_loss=0.1002, pruned_loss=0.01979, audio_tagging_loss=0.00979, over 3041583.14 frames. ], batch size: 61, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:02:00,733 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 153550 2023-11-20 09:02:11,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1023686.6666666666, ans=0.0 2023-11-20 09:02:19,341 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=12.0 2023-11-20 09:02:32,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1023820.0, ans=0.125 2023-11-20 09:02:37,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1023820.0, ans=0.125 2023-11-20 09:02:45,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1023886.6666666666, ans=0.125 2023-11-20 09:02:46,004 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 9300, loss[loss=0.05489, simple_loss=0.06404, pruned_loss=0.01087, audio_tagging_loss=0.012, over 15172.00 frames. ], tot_loss[loss=0.07932, simple_loss=0.09982, pruned_loss=0.01961, audio_tagging_loss=0.009803, over 3040804.47 frames. ], batch size: 59, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:02:51,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1023886.6666666666, ans=0.125 2023-11-20 09:02:58,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1023953.3333333334, ans=0.125 2023-11-20 09:02:59,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1023953.3333333334, ans=0.0 2023-11-20 09:03:05,351 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 153600 2023-11-20 09:03:16,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1024020.0, ans=0.125 2023-11-20 09:03:17,348 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.484e+01 8.349e+01 9.030e+01 1.018e+02 1.384e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-20 09:03:28,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1024086.6666666666, ans=0.05 2023-11-20 09:03:28,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1024086.6666666666, ans=0.125 2023-11-20 09:03:30,333 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.96 vs. limit=22.5 2023-11-20 09:03:33,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1024086.6666666666, ans=0.5 2023-11-20 09:03:35,628 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.61 vs. limit=5.0 2023-11-20 09:03:38,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1024153.3333333334, ans=0.1 2023-11-20 09:03:45,736 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.19 vs. limit=15.0 2023-11-20 09:03:51,237 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 9350, loss[loss=0.1021, simple_loss=0.1312, pruned_loss=0.02961, audio_tagging_loss=0.006916, over 16115.00 frames. ], tot_loss[loss=0.08049, simple_loss=0.1011, pruned_loss=0.02004, audio_tagging_loss=0.009899, over 3055609.67 frames. ], batch size: 58, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:04:01,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1024220.0, ans=0.125 2023-11-20 09:04:06,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1024286.6666666666, ans=0.0 2023-11-20 09:04:08,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1024286.6666666666, ans=0.125 2023-11-20 09:04:10,005 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 153650 2023-11-20 09:04:20,016 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.87 vs. limit=15.0 2023-11-20 09:04:32,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1024420.0, ans=0.125 2023-11-20 09:04:39,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1024420.0, ans=0.125 2023-11-20 09:04:46,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1024486.6666666666, ans=0.5 2023-11-20 09:04:49,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1024486.6666666666, ans=0.125 2023-11-20 09:04:54,535 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 9400, loss[loss=0.08905, simple_loss=0.1014, pruned_loss=0.02821, audio_tagging_loss=0.01015, over 14801.00 frames. ], tot_loss[loss=0.08098, simple_loss=0.1015, pruned_loss=0.02016, audio_tagging_loss=0.01007, over 3052358.25 frames. ], batch size: 53, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:05:00,215 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.06 vs. limit=15.0 2023-11-20 09:05:11,475 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=15.0 2023-11-20 09:05:13,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1024620.0, ans=0.04949747468305833 2023-11-20 09:05:14,468 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 153700 2023-11-20 09:05:17,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1024620.0, ans=0.1 2023-11-20 09:05:22,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1024686.6666666666, ans=0.125 2023-11-20 09:05:25,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1024686.6666666666, ans=0.0 2023-11-20 09:05:26,179 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.189e+01 8.740e+01 9.691e+01 1.507e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-20 09:05:47,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1024820.0, ans=0.125 2023-11-20 09:05:51,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1024820.0, ans=0.125 2023-11-20 09:05:57,762 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:05:59,526 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 9450, loss[loss=0.08401, simple_loss=0.09882, pruned_loss=0.0217, audio_tagging_loss=0.0129, over 15453.00 frames. ], tot_loss[loss=0.08037, simple_loss=0.1006, pruned_loss=0.01985, audio_tagging_loss=0.01023, over 3051806.31 frames. ], batch size: 60, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:06:10,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1024886.6666666666, ans=10.0 2023-11-20 09:06:18,720 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 153750 2023-11-20 09:06:18,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1024953.3333333334, ans=0.125 2023-11-20 09:06:23,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1025020.0, ans=0.07 2023-11-20 09:06:30,103 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=12.0 2023-11-20 09:06:38,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.54 vs. limit=15.0 2023-11-20 09:06:47,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1025086.6666666666, ans=0.2 2023-11-20 09:06:50,335 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.87 vs. limit=22.5 2023-11-20 09:06:55,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1025153.3333333334, ans=0.125 2023-11-20 09:06:57,486 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=15.0 2023-11-20 09:07:04,148 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 9500, loss[loss=0.09475, simple_loss=0.1188, pruned_loss=0.0262, audio_tagging_loss=0.009153, over 15591.00 frames. ], tot_loss[loss=0.08043, simple_loss=0.1005, pruned_loss=0.01985, audio_tagging_loss=0.01033, over 3049793.14 frames. ], batch size: 55, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:07:23,661 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 153800 2023-11-20 09:07:35,619 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.164e+01 8.858e+01 9.394e+01 2.637e+02, threshold=1.772e+02, percent-clipped=1.0 2023-11-20 09:07:39,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1025353.3333333334, ans=0.125 2023-11-20 09:08:08,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1025553.3333333334, ans=0.0 2023-11-20 09:08:09,387 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 9550, loss[loss=0.07504, simple_loss=0.09718, pruned_loss=0.01522, audio_tagging_loss=0.01123, over 14835.00 frames. ], tot_loss[loss=0.08022, simple_loss=0.1003, pruned_loss=0.01973, audio_tagging_loss=0.01033, over 3053035.61 frames. ], batch size: 57, lr: 5.25e-03, grad_scale: 16.0 2023-11-20 09:08:10,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1025553.3333333334, ans=0.1 2023-11-20 09:08:14,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2023-11-20 09:08:25,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1025620.0, ans=0.09899494936611666 2023-11-20 09:08:26,293 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.64 vs. limit=15.0 2023-11-20 09:08:29,296 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 153850 2023-11-20 09:08:43,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1025686.6666666666, ans=0.125 2023-11-20 09:08:51,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1025753.3333333334, ans=0.0 2023-11-20 09:09:02,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.73 vs. limit=10.0 2023-11-20 09:09:07,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1025820.0, ans=0.1 2023-11-20 09:09:09,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1025820.0, ans=0.0 2023-11-20 09:09:14,882 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 9600, loss[loss=0.08425, simple_loss=0.1016, pruned_loss=0.0248, audio_tagging_loss=0.008625, over 14875.00 frames. ], tot_loss[loss=0.08007, simple_loss=0.1003, pruned_loss=0.01956, audio_tagging_loss=0.01036, over 3051485.09 frames. ], batch size: 54, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:09:29,865 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2023-11-20 09:09:34,186 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 153900 2023-11-20 09:09:40,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1026020.0, ans=0.0 2023-11-20 09:09:44,976 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.678e+01 7.953e+01 8.946e+01 9.937e+01 1.277e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-20 09:09:50,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1026020.0, ans=0.125 2023-11-20 09:09:54,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1026086.6666666666, ans=0.125 2023-11-20 09:10:19,601 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 9650, loss[loss=0.07959, simple_loss=0.1038, pruned_loss=0.01794, audio_tagging_loss=0.009764, over 15420.00 frames. ], tot_loss[loss=0.08045, simple_loss=0.1007, pruned_loss=0.01987, audio_tagging_loss=0.01024, over 3046976.54 frames. ], batch size: 56, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:10:21,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2023-11-20 09:10:24,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1026220.0, ans=0.125 2023-11-20 09:10:38,862 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 153950 2023-11-20 09:11:01,798 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:11:23,340 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 9700, loss[loss=0.07771, simple_loss=0.09398, pruned_loss=0.02085, audio_tagging_loss=0.009874, over 14571.00 frames. ], tot_loss[loss=0.0804, simple_loss=0.1005, pruned_loss=0.02006, audio_tagging_loss=0.01009, over 3041854.52 frames. ], batch size: 54, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:11:43,191 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 154000 2023-11-20 09:11:43,365 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:11:45,985 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:11:50,880 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:11:53,261 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.75 vs. limit=15.0 2023-11-20 09:11:54,851 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.479e+01 8.116e+01 8.846e+01 9.566e+01 1.154e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-20 09:12:03,652 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2023-11-20 09:12:09,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1026753.3333333334, ans=0.1 2023-11-20 09:12:09,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1026753.3333333334, ans=22.5 2023-11-20 09:12:24,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1026820.0, ans=0.125 2023-11-20 09:12:27,820 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 9750, loss[loss=0.0826, simple_loss=0.1083, pruned_loss=0.02033, audio_tagging_loss=0.008134, over 14901.00 frames. ], tot_loss[loss=0.07995, simple_loss=0.1004, pruned_loss=0.01975, audio_tagging_loss=0.01003, over 3043324.57 frames. ], batch size: 56, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:12:48,196 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 154050 2023-11-20 09:12:53,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1027020.0, ans=0.2 2023-11-20 09:13:26,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1027153.3333333334, ans=0.1 2023-11-20 09:13:31,052 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=12.0 2023-11-20 09:13:32,820 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 9800, loss[loss=0.1006, simple_loss=0.129, pruned_loss=0.02637, audio_tagging_loss=0.009684, over 15736.00 frames. ], tot_loss[loss=0.0802, simple_loss=0.101, pruned_loss=0.0197, audio_tagging_loss=0.009984, over 3041675.88 frames. ], batch size: 56, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:13:48,726 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.18 vs. limit=22.5 2023-11-20 09:13:51,921 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 154100 2023-11-20 09:14:03,360 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.195e+01 8.924e+01 9.693e+01 1.492e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-20 09:14:07,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1027353.3333333334, ans=0.2 2023-11-20 09:14:08,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1027353.3333333334, ans=0.1 2023-11-20 09:14:09,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1027353.3333333334, ans=0.125 2023-11-20 09:14:13,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1027420.0, ans=0.125 2023-11-20 09:14:22,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.52 vs. limit=22.5 2023-11-20 09:14:30,960 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:14:31,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1027486.6666666666, ans=0.5 2023-11-20 09:14:37,020 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 9850, loss[loss=0.07836, simple_loss=0.1035, pruned_loss=0.01714, audio_tagging_loss=0.009452, over 15562.00 frames. ], tot_loss[loss=0.08103, simple_loss=0.1022, pruned_loss=0.02007, audio_tagging_loss=0.009875, over 3048767.36 frames. ], batch size: 56, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:14:47,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1027553.3333333334, ans=0.1 2023-11-20 09:14:49,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1027620.0, ans=0.125 2023-11-20 09:14:56,560 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 154150 2023-11-20 09:15:07,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=12.0 2023-11-20 09:15:10,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1027686.6666666666, ans=0.0 2023-11-20 09:15:23,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1027753.3333333334, ans=0.0 2023-11-20 09:15:27,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1027820.0, ans=0.125 2023-11-20 09:15:31,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1027820.0, ans=0.125 2023-11-20 09:15:41,470 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 9900, loss[loss=0.09764, simple_loss=0.1309, pruned_loss=0.02511, audio_tagging_loss=0.007069, over 14339.00 frames. ], tot_loss[loss=0.08117, simple_loss=0.1021, pruned_loss=0.02015, audio_tagging_loss=0.009958, over 3047823.18 frames. ], batch size: 55, lr: 5.25e-03, grad_scale: 16.0 2023-11-20 09:16:01,311 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 154200 2023-11-20 09:16:01,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1027953.3333333334, ans=0.5 2023-11-20 09:16:04,856 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.72 vs. limit=12.0 2023-11-20 09:16:14,649 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.724e+01 8.225e+01 8.765e+01 9.379e+01 1.572e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-20 09:16:21,561 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.06 vs. limit=15.0 2023-11-20 09:16:24,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1028086.6666666666, ans=0.125 2023-11-20 09:16:34,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1028153.3333333334, ans=0.125 2023-11-20 09:16:42,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1028153.3333333334, ans=0.025 2023-11-20 09:16:47,250 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 9950, loss[loss=0.07409, simple_loss=0.08484, pruned_loss=0.02155, audio_tagging_loss=0.01012, over 16009.00 frames. ], tot_loss[loss=0.0812, simple_loss=0.102, pruned_loss=0.02031, audio_tagging_loss=0.009899, over 3045857.29 frames. ], batch size: 64, lr: 5.25e-03, grad_scale: 16.0 2023-11-20 09:16:53,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1028220.0, ans=0.125 2023-11-20 09:16:58,787 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2023-11-20 09:17:03,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1028286.6666666666, ans=0.0 2023-11-20 09:17:06,167 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 154250 2023-11-20 09:17:16,528 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.541e-03 2023-11-20 09:17:16,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1028353.3333333334, ans=0.125 2023-11-20 09:17:17,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1028353.3333333334, ans=0.0 2023-11-20 09:17:28,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1028420.0, ans=0.1 2023-11-20 09:17:38,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1028486.6666666666, ans=0.125 2023-11-20 09:17:51,780 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 10000, loss[loss=0.0689, simple_loss=0.09185, pruned_loss=0.01393, audio_tagging_loss=0.009039, over 15084.00 frames. ], tot_loss[loss=0.08094, simple_loss=0.1018, pruned_loss=0.0202, audio_tagging_loss=0.009835, over 3042968.45 frames. ], batch size: 56, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:18:10,797 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 154300 2023-11-20 09:18:15,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1028620.0, ans=0.125 2023-11-20 09:18:19,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1028686.6666666666, ans=0.2 2023-11-20 09:18:19,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1028686.6666666666, ans=0.1 2023-11-20 09:18:23,635 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.484e+01 8.210e+01 8.752e+01 9.474e+01 1.370e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 09:18:29,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1028753.3333333334, ans=0.1 2023-11-20 09:18:30,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1028753.3333333334, ans=0.0 2023-11-20 09:18:38,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1028753.3333333334, ans=0.0 2023-11-20 09:18:55,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1028886.6666666666, ans=0.09899494936611666 2023-11-20 09:18:56,660 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 10050, loss[loss=0.08364, simple_loss=0.09493, pruned_loss=0.02484, audio_tagging_loss=0.01134, over 14617.00 frames. ], tot_loss[loss=0.08035, simple_loss=0.1012, pruned_loss=0.01993, audio_tagging_loss=0.009811, over 3046039.63 frames. ], batch size: 57, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:19:16,392 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 154350 2023-11-20 09:19:38,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1029086.6666666666, ans=0.125 2023-11-20 09:19:39,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1029086.6666666666, ans=0.0 2023-11-20 09:20:01,370 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 10100, loss[loss=0.08548, simple_loss=0.113, pruned_loss=0.01997, audio_tagging_loss=0.009022, over 14318.00 frames. ], tot_loss[loss=0.07977, simple_loss=0.1002, pruned_loss=0.01969, audio_tagging_loss=0.009964, over 3041461.44 frames. ], batch size: 54, lr: 5.25e-03, grad_scale: 16.0 2023-11-20 09:20:07,776 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.17 vs. limit=22.5 2023-11-20 09:20:16,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1029286.6666666666, ans=0.1 2023-11-20 09:20:20,352 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 154400 2023-11-20 09:20:35,226 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.925e+01 8.287e+01 9.301e+01 1.019e+02 1.504e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-20 09:20:45,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1029420.0, ans=0.1 2023-11-20 09:20:48,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1029420.0, ans=0.125 2023-11-20 09:20:50,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1029420.0, ans=0.125 2023-11-20 09:20:53,507 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:21:05,894 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 10150, loss[loss=0.07415, simple_loss=0.1007, pruned_loss=0.0159, audio_tagging_loss=0.007902, over 16047.00 frames. ], tot_loss[loss=0.0799, simple_loss=0.1002, pruned_loss=0.01972, audio_tagging_loss=0.01006, over 3048429.23 frames. ], batch size: 61, lr: 5.24e-03, grad_scale: 16.0 2023-11-20 09:21:16,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1029553.3333333334, ans=0.125 2023-11-20 09:21:25,604 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 154450 2023-11-20 09:21:36,025 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:21:50,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1029753.3333333334, ans=0.125 2023-11-20 09:22:10,471 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 10200, loss[loss=0.07471, simple_loss=0.09829, pruned_loss=0.01601, audio_tagging_loss=0.009563, over 15056.00 frames. ], tot_loss[loss=0.07999, simple_loss=0.1002, pruned_loss=0.01973, audio_tagging_loss=0.01013, over 3043739.18 frames. ], batch size: 55, lr: 5.24e-03, grad_scale: 16.0 2023-11-20 09:22:13,506 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2023-11-20 09:22:30,216 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 154500 2023-11-20 09:22:34,898 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:22:43,346 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.371e+01 8.118e+01 8.919e+01 9.960e+01 1.274e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-20 09:22:47,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1030086.6666666666, ans=0.125 2023-11-20 09:22:47,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=1030086.6666666666, ans=0.1 2023-11-20 09:22:48,797 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=15.0 2023-11-20 09:23:13,985 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 10250, loss[loss=0.07812, simple_loss=0.08892, pruned_loss=0.02385, audio_tagging_loss=0.009808, over 15864.00 frames. ], tot_loss[loss=0.07984, simple_loss=0.09997, pruned_loss=0.01964, audio_tagging_loss=0.01022, over 3046239.49 frames. ], batch size: 60, lr: 5.24e-03, grad_scale: 16.0 2023-11-20 09:23:33,196 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 154550 2023-11-20 09:23:36,107 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=12.0 2023-11-20 09:23:41,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1030353.3333333334, ans=0.125 2023-11-20 09:23:43,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1030353.3333333334, ans=0.125 2023-11-20 09:23:48,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1030353.3333333334, ans=0.125 2023-11-20 09:24:09,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1030486.6666666666, ans=0.07 2023-11-20 09:24:13,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.46 vs. limit=15.0 2023-11-20 09:24:15,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1030486.6666666666, ans=0.125 2023-11-20 09:24:19,136 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 10300, loss[loss=0.09229, simple_loss=0.1179, pruned_loss=0.02526, audio_tagging_loss=0.008065, over 15902.00 frames. ], tot_loss[loss=0.08034, simple_loss=0.1005, pruned_loss=0.01983, audio_tagging_loss=0.01025, over 3040017.01 frames. ], batch size: 58, lr: 5.24e-03, grad_scale: 16.0 2023-11-20 09:24:34,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1030620.0, ans=0.125 2023-11-20 09:24:38,884 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 154600 2023-11-20 09:24:42,305 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.75 vs. limit=15.0 2023-11-20 09:24:52,606 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=12.0 2023-11-20 09:24:53,304 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 8.311e+01 8.875e+01 9.603e+01 1.202e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 09:25:11,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1030820.0, ans=0.2 2023-11-20 09:25:24,170 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 10350, loss[loss=0.0775, simple_loss=0.0935, pruned_loss=0.01859, audio_tagging_loss=0.01216, over 14627.00 frames. ], tot_loss[loss=0.08099, simple_loss=0.1014, pruned_loss=0.01998, audio_tagging_loss=0.01029, over 3047433.18 frames. ], batch size: 56, lr: 5.24e-03, grad_scale: 16.0 2023-11-20 09:25:25,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1030886.6666666666, ans=0.125 2023-11-20 09:25:43,761 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 154650 2023-11-20 09:26:16,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1031153.3333333334, ans=0.1 2023-11-20 09:26:18,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1031153.3333333334, ans=15.0 2023-11-20 09:26:24,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1031153.3333333334, ans=0.2 2023-11-20 09:26:25,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1031153.3333333334, ans=0.125 2023-11-20 09:26:29,380 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 10400, loss[loss=0.07685, simple_loss=0.1023, pruned_loss=0.01694, audio_tagging_loss=0.008754, over 14970.00 frames. ], tot_loss[loss=0.08057, simple_loss=0.1008, pruned_loss=0.01979, audio_tagging_loss=0.0104, over 3050368.63 frames. ], batch size: 57, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:26:29,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1031220.0, ans=0.0 2023-11-20 09:26:48,744 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 154700 2023-11-20 09:27:02,809 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.13 vs. limit=15.0 2023-11-20 09:27:03,086 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.158e+01 8.127e+01 8.781e+01 9.645e+01 1.274e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 09:27:34,483 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 10450, loss[loss=0.09665, simple_loss=0.1203, pruned_loss=0.0238, audio_tagging_loss=0.01272, over 14734.00 frames. ], tot_loss[loss=0.08012, simple_loss=0.1, pruned_loss=0.01965, audio_tagging_loss=0.01044, over 3044975.40 frames. ], batch size: 56, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:27:39,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1031553.3333333334, ans=0.125 2023-11-20 09:27:41,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1031553.3333333334, ans=0.1 2023-11-20 09:27:52,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1031620.0, ans=0.1 2023-11-20 09:27:53,815 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 154750 2023-11-20 09:27:55,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1031620.0, ans=0.0 2023-11-20 09:28:13,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1031753.3333333334, ans=0.1 2023-11-20 09:28:38,217 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.19 vs. limit=22.5 2023-11-20 09:28:38,665 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 10500, loss[loss=0.08821, simple_loss=0.1178, pruned_loss=0.01993, audio_tagging_loss=0.009401, over 16601.00 frames. ], tot_loss[loss=0.07979, simple_loss=0.1001, pruned_loss=0.01945, audio_tagging_loss=0.01029, over 3043039.98 frames. ], batch size: 62, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:28:45,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1031886.6666666666, ans=0.125 2023-11-20 09:28:45,875 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-11-20 09:28:46,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1031886.6666666666, ans=0.125 2023-11-20 09:28:59,024 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 154800 2023-11-20 09:29:13,390 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.926e+01 8.208e+01 9.112e+01 1.062e+02 1.393e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-20 09:29:23,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1032086.6666666666, ans=0.2 2023-11-20 09:29:33,066 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.57 vs. limit=22.5 2023-11-20 09:29:45,132 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 10550, loss[loss=0.09979, simple_loss=0.1251, pruned_loss=0.02458, audio_tagging_loss=0.01266, over 15802.00 frames. ], tot_loss[loss=0.07987, simple_loss=0.1004, pruned_loss=0.01956, audio_tagging_loss=0.01013, over 3040270.09 frames. ], batch size: 58, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:29:57,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1032286.6666666666, ans=0.1 2023-11-20 09:30:04,324 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 154850 2023-11-20 09:30:05,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1032286.6666666666, ans=0.125 2023-11-20 09:30:28,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1032420.0, ans=0.0 2023-11-20 09:30:30,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1032420.0, ans=0.125 2023-11-20 09:30:47,363 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=22.5 2023-11-20 09:30:49,034 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 10600, loss[loss=0.07031, simple_loss=0.08529, pruned_loss=0.01695, audio_tagging_loss=0.01072, over 15680.00 frames. ], tot_loss[loss=0.08014, simple_loss=0.1006, pruned_loss=0.01974, audio_tagging_loss=0.0101, over 3034039.04 frames. ], batch size: 59, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:30:50,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1032553.3333333334, ans=0.0 2023-11-20 09:31:00,278 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=9.386e-02 2023-11-20 09:31:07,375 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.68 vs. limit=15.0 2023-11-20 09:31:08,173 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 154900 2023-11-20 09:31:13,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=1032686.6666666666, ans=0.2 2023-11-20 09:31:21,988 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.175e+01 8.791e+01 9.542e+01 1.185e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-20 09:31:24,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1032686.6666666666, ans=0.1 2023-11-20 09:31:26,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1032753.3333333334, ans=0.125 2023-11-20 09:31:52,034 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 10650, loss[loss=0.09363, simple_loss=0.1154, pruned_loss=0.02733, audio_tagging_loss=0.008624, over 14805.00 frames. ], tot_loss[loss=0.07986, simple_loss=0.1002, pruned_loss=0.01972, audio_tagging_loss=0.01001, over 3034750.66 frames. ], batch size: 53, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:32:12,488 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 154950 2023-11-20 09:32:29,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1033020.0, ans=0.125 2023-11-20 09:32:48,623 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2023-11-20 09:32:52,293 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.47 vs. limit=15.0 2023-11-20 09:32:56,725 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 10700, loss[loss=0.07718, simple_loss=0.09184, pruned_loss=0.01787, audio_tagging_loss=0.01338, over 14300.00 frames. ], tot_loss[loss=0.08017, simple_loss=0.1011, pruned_loss=0.01971, audio_tagging_loss=0.009913, over 3039169.34 frames. ], batch size: 54, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:32:57,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1033220.0, ans=0.125 2023-11-20 09:33:00,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1033220.0, ans=0.125 2023-11-20 09:33:08,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1033220.0, ans=0.125 2023-11-20 09:33:09,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1033286.6666666666, ans=0.0 2023-11-20 09:33:10,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1033286.6666666666, ans=0.07 2023-11-20 09:33:16,892 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 155000 2023-11-20 09:33:29,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1033353.3333333334, ans=0.1 2023-11-20 09:33:30,431 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 8.303e+01 8.834e+01 9.458e+01 1.451e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-20 09:33:57,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1033486.6666666666, ans=0.125 2023-11-20 09:34:00,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1033486.6666666666, ans=0.125 2023-11-20 09:34:02,467 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 10750, loss[loss=0.07344, simple_loss=0.09731, pruned_loss=0.0163, audio_tagging_loss=0.008488, over 16170.00 frames. ], tot_loss[loss=0.08087, simple_loss=0.1023, pruned_loss=0.02, audio_tagging_loss=0.009714, over 3044488.32 frames. ], batch size: 61, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:34:02,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1033553.3333333334, ans=0.07 2023-11-20 09:34:21,014 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 155050 2023-11-20 09:34:32,857 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:34:46,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1033753.3333333334, ans=0.125 2023-11-20 09:35:06,231 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 10800, loss[loss=0.04226, simple_loss=0.05347, pruned_loss=0.006371, audio_tagging_loss=0.009151, over 15009.00 frames. ], tot_loss[loss=0.08021, simple_loss=0.1014, pruned_loss=0.01986, audio_tagging_loss=0.009674, over 3041315.93 frames. ], batch size: 56, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:35:09,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1033886.6666666666, ans=0.125 2023-11-20 09:35:26,101 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 155100 2023-11-20 09:35:40,781 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.539e+01 8.046e+01 8.532e+01 9.175e+01 1.216e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-20 09:35:57,555 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2023-11-20 09:35:58,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1034153.3333333334, ans=0.0 2023-11-20 09:36:11,284 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 10850, loss[loss=0.127, simple_loss=0.1691, pruned_loss=0.03634, audio_tagging_loss=0.006097, over 16192.00 frames. ], tot_loss[loss=0.08058, simple_loss=0.1015, pruned_loss=0.02012, audio_tagging_loss=0.009688, over 3035946.64 frames. ], batch size: 56, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:36:19,868 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.59 vs. limit=15.0 2023-11-20 09:36:32,090 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 155150 2023-11-20 09:36:43,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1034353.3333333334, ans=0.1 2023-11-20 09:36:49,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1034420.0, ans=0.2 2023-11-20 09:37:00,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1034420.0, ans=0.1 2023-11-20 09:37:12,656 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:37:16,322 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 10900, loss[loss=0.107, simple_loss=0.1357, pruned_loss=0.03021, audio_tagging_loss=0.008998, over 15857.00 frames. ], tot_loss[loss=0.08122, simple_loss=0.1023, pruned_loss=0.02025, audio_tagging_loss=0.009819, over 3049555.33 frames. ], batch size: 57, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:37:35,731 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 155200 2023-11-20 09:37:38,679 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:37:46,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1034686.6666666666, ans=0.0 2023-11-20 09:37:50,152 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.033e+01 8.200e+01 8.772e+01 9.722e+01 1.243e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 09:37:58,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1034753.3333333334, ans=0.0 2023-11-20 09:38:08,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1034820.0, ans=0.2 2023-11-20 09:38:18,841 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.54 vs. limit=15.0 2023-11-20 09:38:20,690 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 10950, loss[loss=0.08653, simple_loss=0.1212, pruned_loss=0.01852, audio_tagging_loss=0.007425, over 14645.00 frames. ], tot_loss[loss=0.08137, simple_loss=0.1026, pruned_loss=0.02029, audio_tagging_loss=0.009794, over 3041422.19 frames. ], batch size: 54, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:38:33,378 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-20 09:38:39,898 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 155250 2023-11-20 09:39:03,176 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2023-11-20 09:39:24,992 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 11000, loss[loss=0.05949, simple_loss=0.07112, pruned_loss=0.01324, audio_tagging_loss=0.01069, over 14807.00 frames. ], tot_loss[loss=0.08066, simple_loss=0.1015, pruned_loss=0.02, audio_tagging_loss=0.009908, over 3048111.18 frames. ], batch size: 56, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:39:35,551 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:39:44,957 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 155300 2023-11-20 09:39:52,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1035353.3333333334, ans=0.0 2023-11-20 09:39:54,648 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2023-11-20 09:39:56,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1035353.3333333334, ans=0.125 2023-11-20 09:39:59,025 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.120e+01 8.817e+01 9.505e+01 1.234e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-20 09:40:01,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1035353.3333333334, ans=0.0 2023-11-20 09:40:04,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1035420.0, ans=0.125 2023-11-20 09:40:29,860 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 11050, loss[loss=0.05666, simple_loss=0.0661, pruned_loss=0.01245, audio_tagging_loss=0.01116, over 15945.00 frames. ], tot_loss[loss=0.08104, simple_loss=0.1017, pruned_loss=0.0201, audio_tagging_loss=0.01007, over 3054045.46 frames. ], batch size: 63, lr: 5.23e-03, grad_scale: 16.0 2023-11-20 09:40:38,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1035553.3333333334, ans=0.2 2023-11-20 09:40:46,795 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.24 vs. limit=15.0 2023-11-20 09:40:49,942 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 155350 2023-11-20 09:40:53,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1035620.0, ans=0.125 2023-11-20 09:40:53,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1035620.0, ans=0.125 2023-11-20 09:41:13,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1035753.3333333334, ans=0.07 2023-11-20 09:41:26,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1035820.0, ans=0.1 2023-11-20 09:41:34,879 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 11100, loss[loss=0.06195, simple_loss=0.08138, pruned_loss=0.01191, audio_tagging_loss=0.009355, over 15162.00 frames. ], tot_loss[loss=0.08041, simple_loss=0.101, pruned_loss=0.01968, audio_tagging_loss=0.01022, over 3053518.27 frames. ], batch size: 58, lr: 5.23e-03, grad_scale: 16.0 2023-11-20 09:41:42,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1035886.6666666666, ans=0.0 2023-11-20 09:41:54,065 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 155400 2023-11-20 09:42:03,332 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.10 vs. limit=5.0 2023-11-20 09:42:06,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1036020.0, ans=0.0 2023-11-20 09:42:09,693 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.254e+01 9.028e+01 9.759e+01 1.162e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-20 09:42:11,643 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.28 vs. limit=6.0 2023-11-20 09:42:14,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1036086.6666666666, ans=0.0 2023-11-20 09:42:39,725 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 11150, loss[loss=0.09303, simple_loss=0.1155, pruned_loss=0.02493, audio_tagging_loss=0.01034, over 14950.00 frames. ], tot_loss[loss=0.08041, simple_loss=0.1006, pruned_loss=0.01972, audio_tagging_loss=0.01036, over 3051393.96 frames. ], batch size: 56, lr: 5.23e-03, grad_scale: 16.0 2023-11-20 09:42:52,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1036286.6666666666, ans=0.125 2023-11-20 09:42:54,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.32 vs. limit=22.5 2023-11-20 09:42:58,719 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 155450 2023-11-20 09:43:13,032 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:43:29,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1036420.0, ans=0.0 2023-11-20 09:43:32,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1036486.6666666666, ans=0.125 2023-11-20 09:43:44,255 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 11200, loss[loss=0.05543, simple_loss=0.06245, pruned_loss=0.01254, audio_tagging_loss=0.01167, over 14965.00 frames. ], tot_loss[loss=0.08052, simple_loss=0.1009, pruned_loss=0.01976, audio_tagging_loss=0.01032, over 3046609.31 frames. ], batch size: 58, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:44:02,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1036620.0, ans=0.125 2023-11-20 09:44:03,200 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 155500 2023-11-20 09:44:03,931 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2023-11-20 09:44:08,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1036686.6666666666, ans=0.0 2023-11-20 09:44:19,034 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.839e+01 8.019e+01 8.498e+01 9.323e+01 1.224e+02, threshold=1.700e+02, percent-clipped=0.0 2023-11-20 09:44:29,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1036753.3333333334, ans=0.125 2023-11-20 09:44:31,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1036753.3333333334, ans=0.125 2023-11-20 09:44:48,544 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 11250, loss[loss=0.09528, simple_loss=0.1114, pruned_loss=0.03067, audio_tagging_loss=0.008906, over 15872.00 frames. ], tot_loss[loss=0.08017, simple_loss=0.1007, pruned_loss=0.01955, audio_tagging_loss=0.01026, over 3048631.80 frames. ], batch size: 60, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:44:51,172 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:45:00,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1036953.3333333334, ans=10.0 2023-11-20 09:45:08,206 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 155550 2023-11-20 09:45:12,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1036953.3333333334, ans=0.125 2023-11-20 09:45:54,024 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 11300, loss[loss=0.1025, simple_loss=0.1284, pruned_loss=0.02946, audio_tagging_loss=0.008877, over 14770.00 frames. ], tot_loss[loss=0.08065, simple_loss=0.1015, pruned_loss=0.01986, audio_tagging_loss=0.01005, over 3050176.90 frames. ], batch size: 55, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:46:05,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1037286.6666666666, ans=0.125 2023-11-20 09:46:13,188 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 155600 2023-11-20 09:46:14,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1037286.6666666666, ans=0.0 2023-11-20 09:46:22,039 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.84 vs. limit=15.0 2023-11-20 09:46:23,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1037353.3333333334, ans=0.125 2023-11-20 09:46:28,937 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 8.236e+01 9.129e+01 9.698e+01 1.564e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-20 09:46:59,332 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 11350, loss[loss=0.07388, simple_loss=0.08748, pruned_loss=0.01863, audio_tagging_loss=0.01152, over 14098.00 frames. ], tot_loss[loss=0.08026, simple_loss=0.1006, pruned_loss=0.01987, audio_tagging_loss=0.01008, over 3042430.21 frames. ], batch size: 52, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:47:04,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1037553.3333333334, ans=0.95 2023-11-20 09:47:18,686 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 155650 2023-11-20 09:47:25,856 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.82 vs. limit=10.0 2023-11-20 09:47:50,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1037820.0, ans=0.125 2023-11-20 09:47:57,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.62 vs. limit=22.5 2023-11-20 09:48:04,522 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 11400, loss[loss=0.09244, simple_loss=0.1094, pruned_loss=0.0229, audio_tagging_loss=0.01484, over 14066.00 frames. ], tot_loss[loss=0.0804, simple_loss=0.1012, pruned_loss=0.01981, audio_tagging_loss=0.009999, over 3051619.11 frames. ], batch size: 52, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:48:04,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1037886.6666666666, ans=0.1 2023-11-20 09:48:09,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1037886.6666666666, ans=0.0 2023-11-20 09:48:10,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1037886.6666666666, ans=0.125 2023-11-20 09:48:17,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1037953.3333333334, ans=0.025 2023-11-20 09:48:24,522 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 155700 2023-11-20 09:48:39,690 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.071e+01 7.956e+01 8.738e+01 9.892e+01 2.201e+02, threshold=1.748e+02, percent-clipped=1.0 2023-11-20 09:48:51,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1038086.6666666666, ans=0.0 2023-11-20 09:48:52,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1038086.6666666666, ans=0.2 2023-11-20 09:49:01,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1038153.3333333334, ans=0.125 2023-11-20 09:49:01,595 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.51 vs. limit=15.0 2023-11-20 09:49:06,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1038153.3333333334, ans=0.0 2023-11-20 09:49:06,645 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.63 vs. limit=22.5 2023-11-20 09:49:09,128 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 11450, loss[loss=0.09053, simple_loss=0.1154, pruned_loss=0.02279, audio_tagging_loss=0.01003, over 15857.00 frames. ], tot_loss[loss=0.08056, simple_loss=0.1015, pruned_loss=0.01993, audio_tagging_loss=0.009902, over 3051843.96 frames. ], batch size: 58, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:49:16,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1038220.0, ans=15.0 2023-11-20 09:49:25,228 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:49:28,745 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 155750 2023-11-20 09:49:32,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1038286.6666666666, ans=0.2 2023-11-20 09:49:41,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1038353.3333333334, ans=0.125 2023-11-20 09:49:41,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1038353.3333333334, ans=0.125 2023-11-20 09:49:55,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.86 vs. limit=15.0 2023-11-20 09:50:13,874 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 11500, loss[loss=0.08503, simple_loss=0.1072, pruned_loss=0.01978, audio_tagging_loss=0.01165, over 15682.00 frames. ], tot_loss[loss=0.07996, simple_loss=0.1007, pruned_loss=0.01967, audio_tagging_loss=0.009954, over 3052660.23 frames. ], batch size: 56, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:50:21,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1038553.3333333334, ans=0.0 2023-11-20 09:50:24,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1038553.3333333334, ans=0.2 2023-11-20 09:50:32,924 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 155800 2023-11-20 09:50:47,687 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.235e+01 8.586e+01 9.090e+01 1.242e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-20 09:51:16,292 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2023-11-20 09:51:18,149 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 11550, loss[loss=0.09158, simple_loss=0.1092, pruned_loss=0.02828, audio_tagging_loss=0.008719, over 14795.00 frames. ], tot_loss[loss=0.08078, simple_loss=0.1016, pruned_loss=0.02001, audio_tagging_loss=0.009954, over 3057657.51 frames. ], batch size: 56, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:51:28,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1038886.6666666666, ans=0.2 2023-11-20 09:51:33,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1038953.3333333334, ans=0.2 2023-11-20 09:51:35,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1038953.3333333334, ans=0.125 2023-11-20 09:51:36,543 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 155850 2023-11-20 09:51:49,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1039020.0, ans=0.125 2023-11-20 09:51:55,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1039086.6666666666, ans=0.0 2023-11-20 09:51:56,517 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:51:56,739 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:52:04,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1039086.6666666666, ans=0.125 2023-11-20 09:52:04,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1039086.6666666666, ans=0.125 2023-11-20 09:52:04,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1039086.6666666666, ans=0.125 2023-11-20 09:52:19,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1039153.3333333334, ans=0.0 2023-11-20 09:52:21,330 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 11600, loss[loss=0.0938, simple_loss=0.1167, pruned_loss=0.02802, audio_tagging_loss=0.007415, over 15590.00 frames. ], tot_loss[loss=0.08074, simple_loss=0.1016, pruned_loss=0.02002, audio_tagging_loss=0.009903, over 3056311.81 frames. ], batch size: 56, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:52:25,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1039220.0, ans=0.125 2023-11-20 09:52:41,570 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 155900 2023-11-20 09:52:48,035 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.36 vs. limit=15.0 2023-11-20 09:52:48,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.01 vs. limit=15.0 2023-11-20 09:52:55,918 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2023-11-20 09:52:56,398 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.551e+01 8.203e+01 8.943e+01 9.744e+01 1.251e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-20 09:52:57,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1039353.3333333334, ans=0.0 2023-11-20 09:53:06,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1039420.0, ans=0.1 2023-11-20 09:53:25,679 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 11650, loss[loss=0.07223, simple_loss=0.07893, pruned_loss=0.02022, audio_tagging_loss=0.01254, over 14425.00 frames. ], tot_loss[loss=0.07999, simple_loss=0.1004, pruned_loss=0.01973, audio_tagging_loss=0.01004, over 3051652.15 frames. ], batch size: 56, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:53:45,313 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 155950 2023-11-20 09:54:02,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1039753.3333333334, ans=0.07 2023-11-20 09:54:12,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1039753.3333333334, ans=0.1 2023-11-20 09:54:17,390 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=15.0 2023-11-20 09:54:30,851 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 11700, loss[loss=0.07745, simple_loss=0.09973, pruned_loss=0.01916, audio_tagging_loss=0.008424, over 15314.00 frames. ], tot_loss[loss=0.08047, simple_loss=0.1011, pruned_loss=0.0199, audio_tagging_loss=0.01002, over 3055900.55 frames. ], batch size: 58, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:54:36,448 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.63 vs. limit=15.0 2023-11-20 09:54:49,304 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 156000 2023-11-20 09:54:55,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1039953.3333333334, ans=0.0 2023-11-20 09:55:06,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1040020.0, ans=0.125 2023-11-20 09:55:09,553 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.408e+01 8.314e+01 9.144e+01 1.029e+02 1.424e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-20 09:55:27,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1040153.3333333334, ans=0.07 2023-11-20 09:55:38,355 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 11750, loss[loss=0.05605, simple_loss=0.06265, pruned_loss=0.01286, audio_tagging_loss=0.01187, over 14402.00 frames. ], tot_loss[loss=0.0805, simple_loss=0.101, pruned_loss=0.01993, audio_tagging_loss=0.01007, over 3049668.45 frames. ], batch size: 55, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:55:49,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1040220.0, ans=0.0 2023-11-20 09:55:58,510 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 156050 2023-11-20 09:56:01,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1040286.6666666666, ans=0.125 2023-11-20 09:56:12,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1040353.3333333334, ans=0.0 2023-11-20 09:56:18,304 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=15.0 2023-11-20 09:56:41,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1040553.3333333334, ans=0.1 2023-11-20 09:56:42,547 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 11800, loss[loss=0.06851, simple_loss=0.08198, pruned_loss=0.0151, audio_tagging_loss=0.01243, over 16668.00 frames. ], tot_loss[loss=0.08025, simple_loss=0.1004, pruned_loss=0.02003, audio_tagging_loss=0.01003, over 3048905.68 frames. ], batch size: 62, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:56:46,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1040553.3333333334, ans=0.2 2023-11-20 09:57:02,652 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 156100 2023-11-20 09:57:17,069 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.890e+01 8.238e+01 8.781e+01 9.493e+01 1.182e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 09:57:28,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1040753.3333333334, ans=0.0 2023-11-20 09:57:30,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1040753.3333333334, ans=0.0 2023-11-20 09:57:36,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1040820.0, ans=0.0 2023-11-20 09:57:46,324 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 11850, loss[loss=0.09935, simple_loss=0.1295, pruned_loss=0.02532, audio_tagging_loss=0.009265, over 15349.00 frames. ], tot_loss[loss=0.08061, simple_loss=0.1008, pruned_loss=0.0201, audio_tagging_loss=0.01008, over 3058603.59 frames. ], batch size: 57, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:58:05,462 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 156150 2023-11-20 09:58:29,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1041086.6666666666, ans=0.2 2023-11-20 09:58:50,146 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 11900, loss[loss=0.1002, simple_loss=0.1218, pruned_loss=0.02805, audio_tagging_loss=0.01124, over 15007.00 frames. ], tot_loss[loss=0.07999, simple_loss=0.1003, pruned_loss=0.01969, audio_tagging_loss=0.01015, over 3048466.57 frames. ], batch size: 55, lr: 5.21e-03, grad_scale: 32.0 2023-11-20 09:58:59,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.77 vs. limit=10.0 2023-11-20 09:59:09,368 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 156200 2023-11-20 09:59:25,439 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.397e+01 8.078e+01 8.558e+01 9.294e+01 1.166e+02, threshold=1.712e+02, percent-clipped=0.0 2023-11-20 09:59:28,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1041420.0, ans=0.05 2023-11-20 09:59:31,677 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:59:34,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1041420.0, ans=0.035 2023-11-20 09:59:36,954 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.89 vs. limit=22.5 2023-11-20 09:59:52,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1041486.6666666666, ans=0.125 2023-11-20 09:59:54,112 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 11950, loss[loss=0.07402, simple_loss=0.09401, pruned_loss=0.01693, audio_tagging_loss=0.01009, over 15306.00 frames. ], tot_loss[loss=0.08017, simple_loss=0.1004, pruned_loss=0.01971, audio_tagging_loss=0.01025, over 3042591.76 frames. ], batch size: 59, lr: 5.21e-03, grad_scale: 32.0 2023-11-20 10:00:01,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1041553.3333333334, ans=0.125 2023-11-20 10:00:10,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1041620.0, ans=0.125 2023-11-20 10:00:13,541 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=15.0 2023-11-20 10:00:14,293 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 156250 2023-11-20 10:00:16,073 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2023-11-20 10:00:36,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1041753.3333333334, ans=0.125 2023-11-20 10:00:36,705 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.52 vs. limit=15.0 2023-11-20 10:00:38,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1041753.3333333334, ans=0.125 2023-11-20 10:00:51,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1041820.0, ans=0.125 2023-11-20 10:00:56,381 INFO [train_asr.py:1262] (3/4) Epoch 13, batch 12000, loss[loss=0.105, simple_loss=0.1255, pruned_loss=0.03069, audio_tagging_loss=0.01151, over 15035.00 frames. ], tot_loss[loss=0.08005, simple_loss=0.09998, pruned_loss=0.01968, audio_tagging_loss=0.01038, over 3039682.17 frames. ], batch size: 57, lr: 5.21e-03, grad_scale: 32.0 2023-11-20 10:00:56,381 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-20 10:01:28,999 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.9337, 3.2630, 4.8793, 4.4498], device='cuda:3') 2023-11-20 10:01:36,795 INFO [train_asr.py:1294] (3/4) Epoch 13, validation: loss=0.0624, simple_loss=0.05383, pruned_loss=0.00582, audio_tagging_loss=0.02967, over 4681554.00 frames. 2023-11-20 10:01:36,796 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-20 10:01:40,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1041886.6666666666, ans=0.0 2023-11-20 10:01:54,266 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 156300 2023-11-20 10:02:01,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1042020.0, ans=0.0 2023-11-20 10:02:41,379 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 0, loss[loss=0.07255, simple_loss=0.07257, pruned_loss=0.01309, audio_tagging_loss=0.02317, over 14077.00 frames. ], tot_loss[loss=0.07255, simple_loss=0.07257, pruned_loss=0.01309, audio_tagging_loss=0.02317, over 14077.00 frames. ], batch size: 57, lr: 5.02e-03, grad_scale: 32.0 2023-11-20 10:02:41,380 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-20 10:03:13,592 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9873, 3.1733, 2.9507, 3.1390, 3.4719, 2.7240, 3.3816, 2.6888], device='cuda:3') 2023-11-20 10:03:18,480 INFO [train_asr.py:1294] (3/4) Epoch 14, validation: loss=0.0621, simple_loss=0.05383, pruned_loss=0.005845, audio_tagging_loss=0.02934, over 4681554.00 frames. 2023-11-20 10:03:18,480 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-20 10:03:22,236 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.506e+01 8.326e+01 8.983e+01 9.877e+01 1.645e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-20 10:03:24,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.67 vs. limit=15.0 2023-11-20 10:03:34,891 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.79 vs. limit=15.0 2023-11-20 10:03:38,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1042113.3333333334, ans=0.0 2023-11-20 10:03:48,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1042180.0, ans=0.2 2023-11-20 10:04:02,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1042246.6666666666, ans=0.1 2023-11-20 10:04:12,137 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 156350 2023-11-20 10:04:13,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1042313.3333333334, ans=0.125 2023-11-20 10:04:20,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1042313.3333333334, ans=0.125 2023-11-20 10:04:23,774 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 50, loss[loss=0.08423, simple_loss=0.09623, pruned_loss=0.01772, audio_tagging_loss=0.01839, over 14205.00 frames. ], tot_loss[loss=0.08648, simple_loss=0.09456, pruned_loss=0.01912, audio_tagging_loss=0.02008, over 681084.37 frames. ], batch size: 53, lr: 5.02e-03, grad_scale: 32.0 2023-11-20 10:04:27,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1042380.0, ans=0.125 2023-11-20 10:04:29,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1042380.0, ans=0.0 2023-11-20 10:04:32,376 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=12.0 2023-11-20 10:04:36,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1042446.6666666666, ans=0.0 2023-11-20 10:04:37,948 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2023-11-20 10:04:42,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1042446.6666666666, ans=0.125 2023-11-20 10:05:00,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1042513.3333333334, ans=0.125 2023-11-20 10:05:02,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.49 vs. limit=22.5 2023-11-20 10:05:03,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1042580.0, ans=0.2 2023-11-20 10:05:16,924 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 156400 2023-11-20 10:05:18,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1042646.6666666666, ans=0.125 2023-11-20 10:05:29,544 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 100, loss[loss=0.0991, simple_loss=0.1224, pruned_loss=0.02085, audio_tagging_loss=0.01708, over 15131.00 frames. ], tot_loss[loss=0.08732, simple_loss=0.09774, pruned_loss=0.01936, audio_tagging_loss=0.01909, over 1204974.18 frames. ], batch size: 54, lr: 5.02e-03, grad_scale: 32.0 2023-11-20 10:05:33,221 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.056e+01 8.681e+01 9.274e+01 1.011e+02 1.384e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-20 10:05:33,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1042713.3333333334, ans=0.05 2023-11-20 10:05:49,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1042780.0, ans=0.125 2023-11-20 10:06:22,054 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 156450 2023-11-20 10:06:33,103 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 150, loss[loss=0.08121, simple_loss=0.1111, pruned_loss=0.01509, audio_tagging_loss=0.01056, over 14798.00 frames. ], tot_loss[loss=0.08523, simple_loss=0.09844, pruned_loss=0.01912, audio_tagging_loss=0.01689, over 1614833.17 frames. ], batch size: 56, lr: 5.02e-03, grad_scale: 32.0 2023-11-20 10:06:41,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1043046.6666666666, ans=0.125 2023-11-20 10:07:18,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1043246.6666666666, ans=0.125 2023-11-20 10:07:27,241 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 156500 2023-11-20 10:07:36,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1043313.3333333334, ans=0.125 2023-11-20 10:07:38,240 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 200, loss[loss=0.06466, simple_loss=0.08187, pruned_loss=0.01408, audio_tagging_loss=0.009646, over 16201.00 frames. ], tot_loss[loss=0.08284, simple_loss=0.09793, pruned_loss=0.01898, audio_tagging_loss=0.01489, over 1927760.31 frames. ], batch size: 62, lr: 5.02e-03, grad_scale: 32.0 2023-11-20 10:07:39,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1043380.0, ans=0.2 2023-11-20 10:07:42,548 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.035e+01 8.224e+01 9.022e+01 9.818e+01 1.305e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-20 10:08:14,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1043513.3333333334, ans=0.0 2023-11-20 10:08:29,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1043646.6666666666, ans=0.0 2023-11-20 10:08:31,944 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 156550 2023-11-20 10:08:43,589 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 250, loss[loss=0.1172, simple_loss=0.1477, pruned_loss=0.03582, audio_tagging_loss=0.007464, over 14928.00 frames. ], tot_loss[loss=0.08223, simple_loss=0.0994, pruned_loss=0.01916, audio_tagging_loss=0.01338, over 2175451.04 frames. ], batch size: 55, lr: 5.02e-03, grad_scale: 16.0 2023-11-20 10:08:57,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1043780.0, ans=0.125 2023-11-20 10:09:06,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=1043780.0, ans=0.1 2023-11-20 10:09:06,990 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.83 vs. limit=15.0 2023-11-20 10:09:09,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1043846.6666666666, ans=0.1 2023-11-20 10:09:36,988 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 156600 2023-11-20 10:09:37,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1043980.0, ans=0.125 2023-11-20 10:09:48,902 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 300, loss[loss=0.07855, simple_loss=0.0963, pruned_loss=0.02, audio_tagging_loss=0.01041, over 15236.00 frames. ], tot_loss[loss=0.08224, simple_loss=0.1007, pruned_loss=0.01949, audio_tagging_loss=0.01237, over 2373173.91 frames. ], batch size: 56, lr: 5.02e-03, grad_scale: 16.0 2023-11-20 10:09:54,294 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.950e+01 8.223e+01 8.932e+01 9.585e+01 1.475e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-20 10:09:54,973 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2023-11-20 10:10:03,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1044113.3333333334, ans=0.125 2023-11-20 10:10:20,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1044180.0, ans=0.04949747468305833 2023-11-20 10:10:31,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1044246.6666666666, ans=0.2 2023-11-20 10:10:33,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1044246.6666666666, ans=0.125 2023-11-20 10:10:36,695 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.48 vs. limit=22.5 2023-11-20 10:10:42,190 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 156650 2023-11-20 10:10:53,837 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 350, loss[loss=0.08323, simple_loss=0.1155, pruned_loss=0.01927, audio_tagging_loss=0.006224, over 15770.00 frames. ], tot_loss[loss=0.08188, simple_loss=0.1014, pruned_loss=0.01964, audio_tagging_loss=0.01155, over 2528457.20 frames. ], batch size: 59, lr: 5.02e-03, grad_scale: 4.0 2023-11-20 10:11:07,097 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.49 vs. limit=15.0 2023-11-20 10:11:10,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1044446.6666666666, ans=0.125 2023-11-20 10:11:21,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1044513.3333333334, ans=0.2 2023-11-20 10:11:33,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1044580.0, ans=0.1 2023-11-20 10:11:46,521 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 156700 2023-11-20 10:11:53,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1044646.6666666666, ans=0.125 2023-11-20 10:11:58,293 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 400, loss[loss=0.08441, simple_loss=0.1093, pruned_loss=0.02127, audio_tagging_loss=0.008494, over 15036.00 frames. ], tot_loss[loss=0.08137, simple_loss=0.1013, pruned_loss=0.01965, audio_tagging_loss=0.01109, over 2641981.60 frames. ], batch size: 58, lr: 5.02e-03, grad_scale: 8.0 2023-11-20 10:12:01,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1044713.3333333334, ans=0.0 2023-11-20 10:12:05,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1044713.3333333334, ans=0.125 2023-11-20 10:12:06,228 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.716e+01 8.326e+01 8.879e+01 9.512e+01 2.019e+02, threshold=1.776e+02, percent-clipped=1.0 2023-11-20 10:12:17,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1044780.0, ans=0.07 2023-11-20 10:12:17,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1044780.0, ans=0.025 2023-11-20 10:12:36,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1044913.3333333334, ans=0.09899494936611666 2023-11-20 10:12:36,523 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.52 vs. limit=10.0 2023-11-20 10:12:52,215 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 156750 2023-11-20 10:12:59,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1044980.0, ans=0.125 2023-11-20 10:13:03,955 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 450, loss[loss=0.08197, simple_loss=0.09829, pruned_loss=0.0255, audio_tagging_loss=0.007326, over 14405.00 frames. ], tot_loss[loss=0.08122, simple_loss=0.1015, pruned_loss=0.0197, audio_tagging_loss=0.01077, over 2741272.54 frames. ], batch size: 55, lr: 5.02e-03, grad_scale: 8.0 2023-11-20 10:13:11,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1045046.6666666666, ans=0.2 2023-11-20 10:13:26,389 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.82 vs. limit=10.0 2023-11-20 10:13:57,631 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 156800 2023-11-20 10:14:02,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1045313.3333333334, ans=0.125 2023-11-20 10:14:09,382 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 500, loss[loss=0.07735, simple_loss=0.1017, pruned_loss=0.01747, audio_tagging_loss=0.009024, over 14513.00 frames. ], tot_loss[loss=0.08115, simple_loss=0.1017, pruned_loss=0.01977, audio_tagging_loss=0.01056, over 2806679.69 frames. ], batch size: 56, lr: 5.02e-03, grad_scale: 8.0 2023-11-20 10:14:12,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1045380.0, ans=0.0 2023-11-20 10:14:16,741 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.847e+01 8.307e+01 8.961e+01 9.765e+01 1.460e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-20 10:14:22,848 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2023-11-20 10:14:50,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1045580.0, ans=0.125 2023-11-20 10:15:02,744 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 156850 2023-11-20 10:15:05,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1045646.6666666666, ans=0.0 2023-11-20 10:15:14,466 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 550, loss[loss=0.09502, simple_loss=0.1108, pruned_loss=0.02923, audio_tagging_loss=0.01038, over 15331.00 frames. ], tot_loss[loss=0.08069, simple_loss=0.101, pruned_loss=0.01976, audio_tagging_loss=0.01042, over 2854231.20 frames. ], batch size: 57, lr: 5.01e-03, grad_scale: 8.0 2023-11-20 10:15:21,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1045713.3333333334, ans=0.125 2023-11-20 10:16:08,716 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 156900 2023-11-20 10:16:15,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.33 vs. limit=15.0 2023-11-20 10:16:19,734 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 600, loss[loss=0.07394, simple_loss=0.09626, pruned_loss=0.01586, audio_tagging_loss=0.009946, over 14580.00 frames. ], tot_loss[loss=0.08049, simple_loss=0.1012, pruned_loss=0.01957, audio_tagging_loss=0.0103, over 2896690.62 frames. ], batch size: 54, lr: 5.01e-03, grad_scale: 8.0 2023-11-20 10:16:20,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1046046.6666666666, ans=0.0 2023-11-20 10:16:22,919 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.73 vs. limit=22.5 2023-11-20 10:16:23,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2023-11-20 10:16:23,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1046046.6666666666, ans=0.5 2023-11-20 10:16:27,252 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.847e+01 7.933e+01 8.592e+01 9.443e+01 1.249e+02, threshold=1.718e+02, percent-clipped=0.0 2023-11-20 10:16:31,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1046046.6666666666, ans=0.2 2023-11-20 10:16:42,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1046113.3333333334, ans=0.125 2023-11-20 10:16:49,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1046180.0, ans=0.125 2023-11-20 10:17:05,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1046246.6666666666, ans=0.125 2023-11-20 10:17:13,201 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 156950 2023-11-20 10:17:23,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1046380.0, ans=0.125 2023-11-20 10:17:24,848 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 650, loss[loss=0.07168, simple_loss=0.09381, pruned_loss=0.0144, audio_tagging_loss=0.01038, over 14846.00 frames. ], tot_loss[loss=0.08014, simple_loss=0.1007, pruned_loss=0.01947, audio_tagging_loss=0.01034, over 2918015.49 frames. ], batch size: 56, lr: 5.01e-03, grad_scale: 8.0 2023-11-20 10:17:34,887 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:18:03,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1046580.0, ans=0.125 2023-11-20 10:18:08,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1046580.0, ans=0.125 2023-11-20 10:18:15,071 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.58 vs. limit=22.5 2023-11-20 10:18:18,342 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 157000 2023-11-20 10:18:19,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1046646.6666666666, ans=0.5 2023-11-20 10:18:27,563 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.73 vs. limit=15.0 2023-11-20 10:18:28,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1046646.6666666666, ans=0.125 2023-11-20 10:18:30,357 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 700, loss[loss=0.07544, simple_loss=0.08882, pruned_loss=0.01744, audio_tagging_loss=0.01359, over 15556.00 frames. ], tot_loss[loss=0.08037, simple_loss=0.1012, pruned_loss=0.01962, audio_tagging_loss=0.01014, over 2951296.99 frames. ], batch size: 57, lr: 5.01e-03, grad_scale: 8.0 2023-11-20 10:18:31,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1046713.3333333334, ans=0.125 2023-11-20 10:18:38,475 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.004e+01 8.472e+01 9.225e+01 1.029e+02 2.197e+02, threshold=1.845e+02, percent-clipped=1.0 2023-11-20 10:18:45,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1046780.0, ans=0.125 2023-11-20 10:18:48,757 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:18:53,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1046780.0, ans=0.125 2023-11-20 10:19:03,831 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2023-11-20 10:19:23,881 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 157050 2023-11-20 10:19:29,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1046980.0, ans=0.125 2023-11-20 10:19:35,561 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 750, loss[loss=0.07831, simple_loss=0.09638, pruned_loss=0.02109, audio_tagging_loss=0.009031, over 15314.00 frames. ], tot_loss[loss=0.08102, simple_loss=0.1019, pruned_loss=0.01986, audio_tagging_loss=0.01019, over 2974588.34 frames. ], batch size: 58, lr: 5.01e-03, grad_scale: 8.0 2023-11-20 10:19:40,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1047046.6666666666, ans=0.125 2023-11-20 10:19:49,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1047113.3333333334, ans=0.0 2023-11-20 10:20:12,919 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.65 vs. limit=22.5 2023-11-20 10:20:26,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1047313.3333333334, ans=0.125 2023-11-20 10:20:29,159 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 157100 2023-11-20 10:20:40,827 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 800, loss[loss=0.1053, simple_loss=0.1217, pruned_loss=0.03185, audio_tagging_loss=0.01254, over 15500.00 frames. ], tot_loss[loss=0.0814, simple_loss=0.1021, pruned_loss=0.02009, audio_tagging_loss=0.01024, over 2993777.40 frames. ], batch size: 57, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:20:41,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1047380.0, ans=0.125 2023-11-20 10:20:49,075 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.237e+01 8.953e+01 9.687e+01 1.353e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-20 10:21:34,733 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 157150 2023-11-20 10:21:46,961 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 850, loss[loss=0.07044, simple_loss=0.0953, pruned_loss=0.01497, audio_tagging_loss=0.00783, over 15167.00 frames. ], tot_loss[loss=0.08088, simple_loss=0.1016, pruned_loss=0.01979, audio_tagging_loss=0.0103, over 3007249.77 frames. ], batch size: 57, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:21:50,499 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.34 vs. limit=8.0 2023-11-20 10:21:54,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1047713.3333333334, ans=0.125 2023-11-20 10:22:27,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1047913.3333333334, ans=0.09899494936611666 2023-11-20 10:22:39,782 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 157200 2023-11-20 10:22:44,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1047980.0, ans=0.0 2023-11-20 10:22:47,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1047980.0, ans=0.0 2023-11-20 10:22:51,814 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 900, loss[loss=0.08409, simple_loss=0.1133, pruned_loss=0.01956, audio_tagging_loss=0.00788, over 15161.00 frames. ], tot_loss[loss=0.08115, simple_loss=0.1017, pruned_loss=0.01991, audio_tagging_loss=0.01041, over 3017533.70 frames. ], batch size: 56, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:22:52,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1048046.6666666666, ans=0.0 2023-11-20 10:22:59,173 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.565e+01 8.407e+01 9.404e+01 1.035e+02 1.329e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-20 10:23:10,710 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=15.0 2023-11-20 10:23:37,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1048246.6666666666, ans=0.125 2023-11-20 10:23:44,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1048313.3333333334, ans=0.125 2023-11-20 10:23:45,680 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 157250 2023-11-20 10:23:49,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1048313.3333333334, ans=0.025 2023-11-20 10:23:57,289 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 950, loss[loss=0.08207, simple_loss=0.1066, pruned_loss=0.0187, audio_tagging_loss=0.01004, over 15505.00 frames. ], tot_loss[loss=0.08062, simple_loss=0.1013, pruned_loss=0.01963, audio_tagging_loss=0.01032, over 3019086.58 frames. ], batch size: 58, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:24:02,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1048380.0, ans=0.125 2023-11-20 10:24:16,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1048446.6666666666, ans=0.125 2023-11-20 10:24:31,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1048513.3333333334, ans=0.2 2023-11-20 10:24:36,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1048580.0, ans=0.125 2023-11-20 10:24:36,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1048580.0, ans=0.125 2023-11-20 10:24:49,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1048646.6666666667, ans=0.0 2023-11-20 10:24:50,739 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 157300 2023-11-20 10:25:01,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1048713.3333333333, ans=0.125 2023-11-20 10:25:02,392 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 1000, loss[loss=0.07709, simple_loss=0.09648, pruned_loss=0.01653, audio_tagging_loss=0.01232, over 15751.00 frames. ], tot_loss[loss=0.08017, simple_loss=0.1009, pruned_loss=0.01955, audio_tagging_loss=0.01016, over 3018925.99 frames. ], batch size: 61, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:25:10,467 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.129e+01 7.769e+01 8.550e+01 9.087e+01 1.228e+02, threshold=1.710e+02, percent-clipped=0.0 2023-11-20 10:25:21,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1048780.0, ans=0.125 2023-11-20 10:25:26,906 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=22.5 2023-11-20 10:25:29,927 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 10:25:42,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1048913.3333333333, ans=0.5 2023-11-20 10:25:56,420 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 157350 2023-11-20 10:26:08,111 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 1050, loss[loss=0.08075, simple_loss=0.1032, pruned_loss=0.01873, audio_tagging_loss=0.01043, over 16107.00 frames. ], tot_loss[loss=0.07922, simple_loss=0.1, pruned_loss=0.0192, audio_tagging_loss=0.01001, over 3032642.90 frames. ], batch size: 61, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:26:17,619 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.20 vs. limit=22.5 2023-11-20 10:26:22,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1049113.3333333333, ans=0.0 2023-11-20 10:27:01,898 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 157400 2023-11-20 10:27:07,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1049313.3333333333, ans=0.0 2023-11-20 10:27:13,277 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 1100, loss[loss=0.06744, simple_loss=0.09208, pruned_loss=0.01261, audio_tagging_loss=0.008791, over 15147.00 frames. ], tot_loss[loss=0.07863, simple_loss=0.09925, pruned_loss=0.019, audio_tagging_loss=0.01, over 3030215.52 frames. ], batch size: 57, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:27:17,878 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 10:27:19,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1049380.0, ans=0.125 2023-11-20 10:27:21,541 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.626e+01 8.064e+01 8.696e+01 9.715e+01 1.230e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 10:27:48,900 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.10 vs. limit=12.0 2023-11-20 10:28:02,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1049580.0, ans=0.2 2023-11-20 10:28:07,790 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 157450 2023-11-20 10:28:09,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=1049646.6666666667, ans=0.2 2023-11-20 10:28:19,094 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 1150, loss[loss=0.07533, simple_loss=0.1024, pruned_loss=0.01487, audio_tagging_loss=0.009242, over 15614.00 frames. ], tot_loss[loss=0.07882, simple_loss=0.09925, pruned_loss=0.01923, audio_tagging_loss=0.009965, over 3029315.75 frames. ], batch size: 57, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:28:25,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.47 vs. limit=15.0 2023-11-20 10:28:41,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1049780.0, ans=0.0 2023-11-20 10:28:50,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1049846.6666666667, ans=0.1 2023-11-20 10:29:14,256 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 157500 2023-11-20 10:29:26,536 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 1200, loss[loss=0.04538, simple_loss=0.05509, pruned_loss=0.00804, audio_tagging_loss=0.009792, over 16022.00 frames. ], tot_loss[loss=0.0792, simple_loss=0.1001, pruned_loss=0.01931, audio_tagging_loss=0.009821, over 3032359.41 frames. ], batch size: 61, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:29:30,827 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2023-11-20 10:29:33,804 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.234e+01 8.296e+01 8.818e+01 9.703e+01 1.332e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-20 10:29:52,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1050180.0, ans=0.0 2023-11-20 10:30:10,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1050246.6666666667, ans=0.0 2023-11-20 10:30:19,796 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 157550 2023-11-20 10:30:31,191 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 1250, loss[loss=0.08488, simple_loss=0.1114, pruned_loss=0.01785, audio_tagging_loss=0.01131, over 15211.00 frames. ], tot_loss[loss=0.07896, simple_loss=0.09978, pruned_loss=0.01932, audio_tagging_loss=0.009748, over 3031438.16 frames. ], batch size: 57, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:30:40,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1050380.0, ans=0.125 2023-11-20 10:30:44,768 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.60 vs. limit=6.0 2023-11-20 10:30:48,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1050446.6666666667, ans=0.0 2023-11-20 10:30:48,363 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2023-11-20 10:30:57,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1050513.3333333333, ans=0.1 2023-11-20 10:31:01,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1050513.3333333333, ans=0.025 2023-11-20 10:31:17,350 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.37 vs. limit=15.0 2023-11-20 10:31:21,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1050646.6666666667, ans=0.0 2023-11-20 10:31:24,022 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 157600 2023-11-20 10:31:25,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1050646.6666666667, ans=0.0 2023-11-20 10:31:29,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1050646.6666666667, ans=0.2 2023-11-20 10:31:35,664 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 1300, loss[loss=0.09885, simple_loss=0.1359, pruned_loss=0.0241, audio_tagging_loss=0.00679, over 15886.00 frames. ], tot_loss[loss=0.07858, simple_loss=0.0991, pruned_loss=0.01922, audio_tagging_loss=0.009805, over 3030605.08 frames. ], batch size: 55, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:31:43,830 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.152e+01 8.044e+01 8.712e+01 9.271e+01 1.350e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 10:31:49,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1050780.0, ans=0.2 2023-11-20 10:31:57,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1050780.0, ans=0.0 2023-11-20 10:31:57,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1050780.0, ans=0.2 2023-11-20 10:32:06,864 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.81 vs. limit=10.0 2023-11-20 10:32:10,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1050846.6666666667, ans=0.2 2023-11-20 10:32:17,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1050913.3333333333, ans=0.125 2023-11-20 10:32:29,172 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 157650 2023-11-20 10:32:33,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1050980.0, ans=0.125 2023-11-20 10:32:34,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1050980.0, ans=0.125 2023-11-20 10:32:40,744 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 1350, loss[loss=0.05256, simple_loss=0.06379, pruned_loss=0.008319, audio_tagging_loss=0.01235, over 14573.00 frames. ], tot_loss[loss=0.0789, simple_loss=0.09939, pruned_loss=0.01935, audio_tagging_loss=0.009848, over 3038294.74 frames. ], batch size: 56, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:32:41,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1051046.6666666667, ans=0.0 2023-11-20 10:32:55,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1051113.3333333333, ans=0.05 2023-11-20 10:33:14,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1051180.0, ans=0.125 2023-11-20 10:33:25,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.21 vs. limit=15.0 2023-11-20 10:33:27,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1051246.6666666667, ans=0.125 2023-11-20 10:33:28,896 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 10:33:32,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1051313.3333333333, ans=0.125 2023-11-20 10:33:34,516 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 157700 2023-11-20 10:33:35,128 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.64 vs. limit=22.5 2023-11-20 10:33:46,208 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 1400, loss[loss=0.06855, simple_loss=0.09237, pruned_loss=0.01323, audio_tagging_loss=0.009139, over 16306.00 frames. ], tot_loss[loss=0.0803, simple_loss=0.101, pruned_loss=0.01989, audio_tagging_loss=0.009889, over 3043596.92 frames. ], batch size: 62, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:33:51,839 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.57 vs. limit=22.5 2023-11-20 10:33:55,360 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.246e+01 8.256e+01 8.950e+01 9.563e+01 1.336e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-20 10:33:57,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1051380.0, ans=0.125 2023-11-20 10:34:11,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1051513.3333333333, ans=0.125 2023-11-20 10:34:22,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1051513.3333333333, ans=0.1 2023-11-20 10:34:30,833 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.67 vs. limit=22.5 2023-11-20 10:34:31,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1051580.0, ans=0.04949747468305833 2023-11-20 10:34:38,678 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 157750 2023-11-20 10:34:38,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1051646.6666666667, ans=0.125 2023-11-20 10:34:49,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1051713.3333333333, ans=0.2 2023-11-20 10:34:50,302 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 1450, loss[loss=0.08295, simple_loss=0.1055, pruned_loss=0.02153, audio_tagging_loss=0.008652, over 15117.00 frames. ], tot_loss[loss=0.08016, simple_loss=0.1008, pruned_loss=0.01982, audio_tagging_loss=0.009945, over 3040255.55 frames. ], batch size: 56, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:34:50,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1051713.3333333333, ans=0.2 2023-11-20 10:34:53,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1051713.3333333333, ans=0.04949747468305833 2023-11-20 10:34:55,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1051713.3333333333, ans=0.0 2023-11-20 10:35:28,836 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2023-11-20 10:35:36,299 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:35:43,435 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 157800 2023-11-20 10:35:55,550 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 1500, loss[loss=0.06496, simple_loss=0.08263, pruned_loss=0.01364, audio_tagging_loss=0.01, over 14857.00 frames. ], tot_loss[loss=0.08087, simple_loss=0.1019, pruned_loss=0.01999, audio_tagging_loss=0.009953, over 3040016.82 frames. ], batch size: 57, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:36:01,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2023-11-20 10:36:04,965 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.891e+01 8.201e+01 8.884e+01 9.746e+01 1.400e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-20 10:36:15,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1052113.3333333333, ans=0.0 2023-11-20 10:36:16,444 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.06 vs. limit=22.5 2023-11-20 10:36:18,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1052113.3333333333, ans=0.1 2023-11-20 10:36:28,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1052180.0, ans=0.125 2023-11-20 10:36:31,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1052180.0, ans=0.125 2023-11-20 10:36:49,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1052313.3333333333, ans=0.125 2023-11-20 10:36:50,130 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 157850 2023-11-20 10:36:55,868 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2023-11-20 10:37:01,350 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 1550, loss[loss=0.108, simple_loss=0.1497, pruned_loss=0.02499, audio_tagging_loss=0.008186, over 16124.00 frames. ], tot_loss[loss=0.08113, simple_loss=0.102, pruned_loss=0.02006, audio_tagging_loss=0.01005, over 3047371.43 frames. ], batch size: 56, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:37:13,421 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2023-11-20 10:37:25,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1052446.6666666667, ans=0.2 2023-11-20 10:37:33,765 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-11-20 10:37:55,480 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 157900 2023-11-20 10:38:02,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1052646.6666666667, ans=0.04949747468305833 2023-11-20 10:38:06,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1052713.3333333333, ans=0.2 2023-11-20 10:38:07,080 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 1600, loss[loss=0.06871, simple_loss=0.08626, pruned_loss=0.01355, audio_tagging_loss=0.01203, over 15106.00 frames. ], tot_loss[loss=0.08082, simple_loss=0.1013, pruned_loss=0.02001, audio_tagging_loss=0.01016, over 3042712.92 frames. ], batch size: 58, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:38:07,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1052713.3333333333, ans=0.0 2023-11-20 10:38:12,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1052713.3333333333, ans=0.125 2023-11-20 10:38:15,685 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.476e+01 8.170e+01 8.883e+01 9.558e+01 1.180e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-20 10:38:19,374 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=22.5 2023-11-20 10:38:26,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1052780.0, ans=0.125 2023-11-20 10:38:38,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1052846.6666666667, ans=0.1 2023-11-20 10:38:45,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1052913.3333333333, ans=0.09899494936611666 2023-11-20 10:38:45,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1052913.3333333333, ans=0.1 2023-11-20 10:38:53,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1052913.3333333333, ans=0.0 2023-11-20 10:39:00,466 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 157950 2023-11-20 10:39:12,303 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 1650, loss[loss=0.08601, simple_loss=0.1087, pruned_loss=0.02371, audio_tagging_loss=0.007939, over 15899.00 frames. ], tot_loss[loss=0.0806, simple_loss=0.1012, pruned_loss=0.01978, audio_tagging_loss=0.01024, over 3048377.28 frames. ], batch size: 57, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:39:21,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1053046.6666666667, ans=0.125 2023-11-20 10:39:44,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1053180.0, ans=0.125 2023-11-20 10:40:02,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1053246.6666666667, ans=0.0 2023-11-20 10:40:06,100 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 158000 2023-11-20 10:40:13,118 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:40:17,629 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 1700, loss[loss=0.09033, simple_loss=0.1128, pruned_loss=0.02282, audio_tagging_loss=0.01111, over 15711.00 frames. ], tot_loss[loss=0.08093, simple_loss=0.1016, pruned_loss=0.01992, audio_tagging_loss=0.01021, over 3046668.51 frames. ], batch size: 56, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:40:26,724 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.487e+01 8.158e+01 8.615e+01 9.243e+01 1.140e+02, threshold=1.723e+02, percent-clipped=0.0 2023-11-20 10:40:50,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1053513.3333333333, ans=0.125 2023-11-20 10:41:10,241 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 158050 2023-11-20 10:41:17,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1053646.6666666667, ans=0.125 2023-11-20 10:41:22,451 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 1750, loss[loss=0.08052, simple_loss=0.1078, pruned_loss=0.01947, audio_tagging_loss=0.007144, over 14094.00 frames. ], tot_loss[loss=0.08092, simple_loss=0.1018, pruned_loss=0.0199, audio_tagging_loss=0.01011, over 3044243.43 frames. ], batch size: 54, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:41:32,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1053713.3333333333, ans=0.125 2023-11-20 10:41:33,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1053780.0, ans=0.1 2023-11-20 10:41:46,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1053780.0, ans=0.2 2023-11-20 10:42:02,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1053913.3333333333, ans=0.5 2023-11-20 10:42:02,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.48 vs. limit=15.0 2023-11-20 10:42:10,093 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.61 vs. limit=10.0 2023-11-20 10:42:15,706 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 158100 2023-11-20 10:42:17,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1053980.0, ans=0.2 2023-11-20 10:42:27,190 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 1800, loss[loss=0.09075, simple_loss=0.12, pruned_loss=0.02345, audio_tagging_loss=0.007308, over 15810.00 frames. ], tot_loss[loss=0.0805, simple_loss=0.1014, pruned_loss=0.01979, audio_tagging_loss=0.01002, over 3045696.88 frames. ], batch size: 55, lr: 4.99e-03, grad_scale: 16.0 2023-11-20 10:42:27,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1054046.6666666667, ans=0.0 2023-11-20 10:42:28,692 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:42:31,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1054046.6666666667, ans=0.125 2023-11-20 10:42:37,100 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=12.0 2023-11-20 10:42:37,573 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.937e+01 8.074e+01 8.907e+01 9.490e+01 1.284e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-20 10:42:58,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1054180.0, ans=0.0 2023-11-20 10:43:04,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1054246.6666666667, ans=0.125 2023-11-20 10:43:15,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1054246.6666666667, ans=22.5 2023-11-20 10:43:20,189 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 158150 2023-11-20 10:43:31,649 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 1850, loss[loss=0.09128, simple_loss=0.1152, pruned_loss=0.0255, audio_tagging_loss=0.008202, over 17094.00 frames. ], tot_loss[loss=0.08005, simple_loss=0.1007, pruned_loss=0.01966, audio_tagging_loss=0.01006, over 3043964.91 frames. ], batch size: 63, lr: 4.99e-03, grad_scale: 16.0 2023-11-20 10:43:41,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1054380.0, ans=0.125 2023-11-20 10:43:41,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1054380.0, ans=0.125 2023-11-20 10:43:55,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1054446.6666666667, ans=0.1 2023-11-20 10:44:01,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1054513.3333333333, ans=0.0 2023-11-20 10:44:18,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1054580.0, ans=10.0 2023-11-20 10:44:25,142 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 158200 2023-11-20 10:44:37,036 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 1900, loss[loss=0.07296, simple_loss=0.09567, pruned_loss=0.01545, audio_tagging_loss=0.009673, over 16029.00 frames. ], tot_loss[loss=0.07974, simple_loss=0.1004, pruned_loss=0.01953, audio_tagging_loss=0.01001, over 3053472.83 frames. ], batch size: 62, lr: 4.99e-03, grad_scale: 16.0 2023-11-20 10:44:47,502 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.693e+01 8.196e+01 8.901e+01 9.660e+01 1.214e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-20 10:45:00,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1054780.0, ans=0.125 2023-11-20 10:45:12,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1054846.6666666667, ans=0.0 2023-11-20 10:45:17,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1054913.3333333333, ans=0.0 2023-11-20 10:45:27,199 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.16 vs. limit=15.0 2023-11-20 10:45:30,921 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 158250 2023-11-20 10:45:41,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1055046.6666666667, ans=0.1 2023-11-20 10:45:42,704 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 1950, loss[loss=0.08746, simple_loss=0.109, pruned_loss=0.0222, audio_tagging_loss=0.01074, over 14714.00 frames. ], tot_loss[loss=0.08002, simple_loss=0.1007, pruned_loss=0.01972, audio_tagging_loss=0.009936, over 3056930.10 frames. ], batch size: 57, lr: 4.99e-03, grad_scale: 16.0 2023-11-20 10:45:50,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1055046.6666666667, ans=0.125 2023-11-20 10:46:01,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1055113.3333333333, ans=0.0 2023-11-20 10:46:18,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1055180.0, ans=0.0 2023-11-20 10:46:28,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1055246.6666666667, ans=0.05 2023-11-20 10:46:34,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1055313.3333333333, ans=0.125 2023-11-20 10:46:36,403 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 158300 2023-11-20 10:46:47,911 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 2000, loss[loss=0.07922, simple_loss=0.1008, pruned_loss=0.02037, audio_tagging_loss=0.008454, over 15312.00 frames. ], tot_loss[loss=0.07979, simple_loss=0.1001, pruned_loss=0.01967, audio_tagging_loss=0.01005, over 3046342.51 frames. ], batch size: 56, lr: 4.99e-03, grad_scale: 32.0 2023-11-20 10:46:57,729 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.488e+01 8.029e+01 8.442e+01 9.197e+01 1.092e+02, threshold=1.688e+02, percent-clipped=0.0 2023-11-20 10:47:03,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1055446.6666666667, ans=0.1 2023-11-20 10:47:22,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1055513.3333333333, ans=0.0 2023-11-20 10:47:30,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1055580.0, ans=0.125 2023-11-20 10:47:41,025 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 158350 2023-11-20 10:47:52,031 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 2050, loss[loss=0.07235, simple_loss=0.0936, pruned_loss=0.01432, audio_tagging_loss=0.01123, over 14907.00 frames. ], tot_loss[loss=0.07995, simple_loss=0.1006, pruned_loss=0.01968, audio_tagging_loss=0.009987, over 3033708.79 frames. ], batch size: 57, lr: 4.99e-03, grad_scale: 32.0 2023-11-20 10:48:15,770 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.52 vs. limit=15.0 2023-11-20 10:48:43,659 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.21 vs. limit=22.5 2023-11-20 10:48:45,941 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 158400 2023-11-20 10:48:49,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1055980.0, ans=0.125 2023-11-20 10:48:58,055 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 2100, loss[loss=0.07287, simple_loss=0.09426, pruned_loss=0.01392, audio_tagging_loss=0.01183, over 15385.00 frames. ], tot_loss[loss=0.08065, simple_loss=0.1015, pruned_loss=0.01996, audio_tagging_loss=0.009909, over 3033201.20 frames. ], batch size: 58, lr: 4.99e-03, grad_scale: 32.0 2023-11-20 10:49:08,623 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.360e+01 8.882e+01 9.714e+01 1.219e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-20 10:49:11,789 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.52 vs. limit=10.0 2023-11-20 10:49:38,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1056246.6666666667, ans=0.0 2023-11-20 10:49:45,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1056246.6666666667, ans=0.125 2023-11-20 10:49:51,994 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 158450 2023-11-20 10:50:02,842 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 2150, loss[loss=0.08562, simple_loss=0.1096, pruned_loss=0.02264, audio_tagging_loss=0.008195, over 14691.00 frames. ], tot_loss[loss=0.07995, simple_loss=0.1005, pruned_loss=0.01978, audio_tagging_loss=0.009908, over 3031051.37 frames. ], batch size: 53, lr: 4.99e-03, grad_scale: 32.0 2023-11-20 10:50:12,973 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.88 vs. limit=15.0 2023-11-20 10:50:42,489 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 10:50:42,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1056580.0, ans=0.1 2023-11-20 10:50:53,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1056580.0, ans=0.0 2023-11-20 10:50:56,676 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 158500 2023-11-20 10:50:56,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1056646.6666666667, ans=0.09899494936611666 2023-11-20 10:51:07,661 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 2200, loss[loss=0.09425, simple_loss=0.1182, pruned_loss=0.02698, audio_tagging_loss=0.008149, over 14768.00 frames. ], tot_loss[loss=0.07932, simple_loss=0.0996, pruned_loss=0.01954, audio_tagging_loss=0.009984, over 3035927.04 frames. ], batch size: 54, lr: 4.99e-03, grad_scale: 32.0 2023-11-20 10:51:19,262 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.870e+01 8.263e+01 8.832e+01 9.449e+01 1.423e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-20 10:51:22,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1056780.0, ans=0.0 2023-11-20 10:51:27,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1056780.0, ans=0.0 2023-11-20 10:51:43,662 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2023-11-20 10:51:44,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1056846.6666666667, ans=0.125 2023-11-20 10:52:00,258 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 158550 2023-11-20 10:52:12,223 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 2250, loss[loss=0.06639, simple_loss=0.08247, pruned_loss=0.01459, audio_tagging_loss=0.01056, over 15125.00 frames. ], tot_loss[loss=0.08019, simple_loss=0.1006, pruned_loss=0.01989, audio_tagging_loss=0.01001, over 3038535.19 frames. ], batch size: 59, lr: 4.99e-03, grad_scale: 8.0 2023-11-20 10:52:13,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1057046.6666666667, ans=0.125 2023-11-20 10:52:15,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1057046.6666666667, ans=0.125 2023-11-20 10:52:16,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1057046.6666666667, ans=0.0 2023-11-20 10:52:48,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1057180.0, ans=0.1 2023-11-20 10:52:51,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1057246.6666666667, ans=0.0 2023-11-20 10:52:53,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1057246.6666666667, ans=0.0 2023-11-20 10:52:59,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1057246.6666666667, ans=0.0 2023-11-20 10:53:05,805 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 158600 2023-11-20 10:53:18,560 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 2300, loss[loss=0.06154, simple_loss=0.07191, pruned_loss=0.01279, audio_tagging_loss=0.0128, over 14996.00 frames. ], tot_loss[loss=0.08054, simple_loss=0.101, pruned_loss=0.02001, audio_tagging_loss=0.01006, over 3038272.54 frames. ], batch size: 56, lr: 4.99e-03, grad_scale: 8.0 2023-11-20 10:53:25,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1057380.0, ans=0.125 2023-11-20 10:53:30,085 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.80 vs. limit=15.0 2023-11-20 10:53:31,783 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.567e+01 8.157e+01 8.586e+01 9.219e+01 1.150e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-20 10:53:40,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1057446.6666666667, ans=0.125 2023-11-20 10:53:49,374 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.85 vs. limit=22.5 2023-11-20 10:53:50,775 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.66 vs. limit=15.0 2023-11-20 10:53:54,640 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.67 vs. limit=15.0 2023-11-20 10:54:13,232 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 158650 2023-11-20 10:54:16,827 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 10:54:24,169 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 2350, loss[loss=0.09156, simple_loss=0.1143, pruned_loss=0.02354, audio_tagging_loss=0.01085, over 15146.00 frames. ], tot_loss[loss=0.08072, simple_loss=0.1016, pruned_loss=0.0199, audio_tagging_loss=0.01004, over 3037604.58 frames. ], batch size: 56, lr: 4.99e-03, grad_scale: 8.0 2023-11-20 10:54:27,759 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.82 vs. limit=10.0 2023-11-20 10:54:31,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1057713.3333333333, ans=0.125 2023-11-20 10:54:57,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1057846.6666666667, ans=0.125 2023-11-20 10:55:05,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1057913.3333333333, ans=0.1 2023-11-20 10:55:05,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1057913.3333333333, ans=0.1 2023-11-20 10:55:17,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1057980.0, ans=0.125 2023-11-20 10:55:18,112 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 158700 2023-11-20 10:55:29,615 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 2400, loss[loss=0.08828, simple_loss=0.1115, pruned_loss=0.0195, audio_tagging_loss=0.01305, over 15265.00 frames. ], tot_loss[loss=0.08055, simple_loss=0.1012, pruned_loss=0.01974, audio_tagging_loss=0.0102, over 3040182.23 frames. ], batch size: 60, lr: 4.99e-03, grad_scale: 16.0 2023-11-20 10:55:42,634 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.027e+01 8.593e+01 9.391e+01 2.644e+02, threshold=1.719e+02, percent-clipped=1.0 2023-11-20 10:55:54,724 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=12.0 2023-11-20 10:56:22,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1058313.3333333333, ans=0.125 2023-11-20 10:56:22,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1058313.3333333333, ans=0.2 2023-11-20 10:56:23,332 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 158750 2023-11-20 10:56:31,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1058313.3333333333, ans=10.0 2023-11-20 10:56:35,176 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 2450, loss[loss=0.07248, simple_loss=0.09754, pruned_loss=0.01312, audio_tagging_loss=0.01059, over 15778.00 frames. ], tot_loss[loss=0.08037, simple_loss=0.101, pruned_loss=0.01971, audio_tagging_loss=0.01019, over 3033485.00 frames. ], batch size: 59, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 10:57:00,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2023-11-20 10:57:22,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1058580.0, ans=0.1 2023-11-20 10:57:26,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1058646.6666666667, ans=0.95 2023-11-20 10:57:29,306 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 158800 2023-11-20 10:57:29,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1058646.6666666667, ans=0.1 2023-11-20 10:57:41,063 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 2500, loss[loss=0.08461, simple_loss=0.1034, pruned_loss=0.02475, audio_tagging_loss=0.008146, over 14731.00 frames. ], tot_loss[loss=0.0801, simple_loss=0.1007, pruned_loss=0.01955, audio_tagging_loss=0.0102, over 3032074.87 frames. ], batch size: 56, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 10:57:54,005 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.094e+01 8.721e+01 9.744e+01 1.305e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-20 10:57:55,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1058780.0, ans=0.2 2023-11-20 10:58:34,510 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 158850 2023-11-20 10:58:46,281 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 2550, loss[loss=0.0925, simple_loss=0.1201, pruned_loss=0.02565, audio_tagging_loss=0.006826, over 15824.00 frames. ], tot_loss[loss=0.08012, simple_loss=0.1006, pruned_loss=0.01973, audio_tagging_loss=0.0101, over 3035628.77 frames. ], batch size: 58, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 10:58:46,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1059046.6666666667, ans=0.07 2023-11-20 10:58:58,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1059113.3333333333, ans=0.05 2023-11-20 10:59:00,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1059113.3333333333, ans=0.0 2023-11-20 10:59:04,851 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=15.0 2023-11-20 10:59:18,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1059180.0, ans=0.1 2023-11-20 10:59:23,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1059180.0, ans=0.125 2023-11-20 10:59:39,885 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 158900 2023-11-20 10:59:40,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1059313.3333333333, ans=0.125 2023-11-20 10:59:41,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1059313.3333333333, ans=0.0 2023-11-20 10:59:46,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1059313.3333333333, ans=0.0 2023-11-20 10:59:51,261 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 2600, loss[loss=0.07968, simple_loss=0.1135, pruned_loss=0.01762, audio_tagging_loss=0.005303, over 15323.00 frames. ], tot_loss[loss=0.07982, simple_loss=0.1005, pruned_loss=0.01952, audio_tagging_loss=0.01007, over 3041611.34 frames. ], batch size: 59, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 11:00:00,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1059380.0, ans=0.0 2023-11-20 11:00:04,930 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.842e+01 8.545e+01 8.985e+01 9.606e+01 1.560e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-20 11:00:15,622 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2023-11-20 11:00:26,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1059513.3333333333, ans=0.125 2023-11-20 11:00:39,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1059580.0, ans=0.0 2023-11-20 11:00:44,950 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 158950 2023-11-20 11:00:49,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1059646.6666666667, ans=0.1 2023-11-20 11:00:49,865 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.17 vs. limit=6.0 2023-11-20 11:00:54,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1059646.6666666667, ans=0.05 2023-11-20 11:00:57,137 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 2650, loss[loss=0.08378, simple_loss=0.1109, pruned_loss=0.01747, audio_tagging_loss=0.01084, over 15405.00 frames. ], tot_loss[loss=0.0797, simple_loss=0.1007, pruned_loss=0.01933, audio_tagging_loss=0.01004, over 3041870.44 frames. ], batch size: 56, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 11:01:00,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1059713.3333333333, ans=0.1 2023-11-20 11:01:13,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1059780.0, ans=0.125 2023-11-20 11:01:18,094 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:01:19,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1059780.0, ans=0.2 2023-11-20 11:01:20,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1059780.0, ans=0.0 2023-11-20 11:01:27,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1059846.6666666667, ans=0.125 2023-11-20 11:01:30,182 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=12.0 2023-11-20 11:01:50,658 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 159000 2023-11-20 11:01:58,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1059980.0, ans=0.2 2023-11-20 11:02:02,644 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 2700, loss[loss=0.06773, simple_loss=0.07425, pruned_loss=0.0161, audio_tagging_loss=0.01451, over 14979.00 frames. ], tot_loss[loss=0.07957, simple_loss=0.1003, pruned_loss=0.01943, audio_tagging_loss=0.009984, over 3037725.04 frames. ], batch size: 58, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 11:02:04,194 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:02:08,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1060046.6666666667, ans=0.125 2023-11-20 11:02:15,843 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.786e+01 7.991e+01 8.664e+01 9.430e+01 1.129e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 11:02:28,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1060180.0, ans=0.2 2023-11-20 11:02:30,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1060180.0, ans=0.0 2023-11-20 11:02:56,836 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 159050 2023-11-20 11:03:00,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.83 vs. limit=15.0 2023-11-20 11:03:08,547 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 2750, loss[loss=0.06601, simple_loss=0.08036, pruned_loss=0.0163, audio_tagging_loss=0.009534, over 14825.00 frames. ], tot_loss[loss=0.07915, simple_loss=0.0995, pruned_loss=0.01938, audio_tagging_loss=0.01001, over 3039992.98 frames. ], batch size: 56, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 11:03:16,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1060380.0, ans=0.125 2023-11-20 11:03:18,485 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.33 vs. limit=15.0 2023-11-20 11:03:32,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1060446.6666666667, ans=0.09899494936611666 2023-11-20 11:04:01,992 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 159100 2023-11-20 11:04:04,404 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:04:06,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1060646.6666666667, ans=0.0 2023-11-20 11:04:12,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.52 vs. limit=10.0 2023-11-20 11:04:13,754 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 2800, loss[loss=0.07509, simple_loss=0.09183, pruned_loss=0.01942, audio_tagging_loss=0.009751, over 14712.00 frames. ], tot_loss[loss=0.07921, simple_loss=0.09952, pruned_loss=0.01946, audio_tagging_loss=0.00999, over 3045057.34 frames. ], batch size: 56, lr: 4.98e-03, grad_scale: 32.0 2023-11-20 11:04:26,644 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.546e+01 8.183e+01 8.895e+01 9.590e+01 1.282e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 11:04:28,278 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:04:53,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1060913.3333333333, ans=0.125 2023-11-20 11:04:54,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1060913.3333333333, ans=0.125 2023-11-20 11:05:07,154 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 159150 2023-11-20 11:05:18,760 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 2850, loss[loss=0.07739, simple_loss=0.109, pruned_loss=0.01631, audio_tagging_loss=0.006583, over 14869.00 frames. ], tot_loss[loss=0.07898, simple_loss=0.09956, pruned_loss=0.0193, audio_tagging_loss=0.009907, over 3040113.66 frames. ], batch size: 54, lr: 4.98e-03, grad_scale: 32.0 2023-11-20 11:05:19,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1061046.6666666667, ans=0.0 2023-11-20 11:05:28,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1061046.6666666667, ans=0.0 2023-11-20 11:06:08,530 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.43 vs. limit=22.5 2023-11-20 11:06:12,344 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 159200 2023-11-20 11:06:17,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1061313.3333333333, ans=0.0 2023-11-20 11:06:24,450 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 2900, loss[loss=0.09282, simple_loss=0.1085, pruned_loss=0.0271, audio_tagging_loss=0.01146, over 14926.00 frames. ], tot_loss[loss=0.07984, simple_loss=0.1006, pruned_loss=0.01964, audio_tagging_loss=0.009884, over 3042191.34 frames. ], batch size: 56, lr: 4.98e-03, grad_scale: 32.0 2023-11-20 11:06:37,436 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.449e+01 8.069e+01 8.700e+01 9.440e+01 1.245e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-20 11:07:03,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1061580.0, ans=0.125 2023-11-20 11:07:07,963 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.05 vs. limit=10.0 2023-11-20 11:07:13,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1061580.0, ans=0.125 2023-11-20 11:07:16,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1061646.6666666667, ans=0.1 2023-11-20 11:07:17,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1061646.6666666667, ans=0.125 2023-11-20 11:07:18,176 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 159250 2023-11-20 11:07:29,981 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 2950, loss[loss=0.07108, simple_loss=0.07515, pruned_loss=0.01927, audio_tagging_loss=0.01424, over 15096.00 frames. ], tot_loss[loss=0.07934, simple_loss=0.0999, pruned_loss=0.0195, audio_tagging_loss=0.009892, over 3051787.25 frames. ], batch size: 59, lr: 4.98e-03, grad_scale: 32.0 2023-11-20 11:07:36,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1061713.3333333333, ans=0.125 2023-11-20 11:08:15,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1061913.3333333333, ans=0.2 2023-11-20 11:08:17,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1061913.3333333333, ans=0.1 2023-11-20 11:08:22,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1061980.0, ans=0.125 2023-11-20 11:08:23,505 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 159300 2023-11-20 11:08:34,001 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.47 vs. limit=22.5 2023-11-20 11:08:34,577 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 3000, loss[loss=0.09442, simple_loss=0.1261, pruned_loss=0.02253, audio_tagging_loss=0.008845, over 16076.00 frames. ], tot_loss[loss=0.08022, simple_loss=0.1009, pruned_loss=0.01978, audio_tagging_loss=0.00999, over 3052054.77 frames. ], batch size: 58, lr: 4.98e-03, grad_scale: 32.0 2023-11-20 11:08:34,578 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-20 11:09:14,357 INFO [train_asr.py:1294] (3/4) Epoch 14, validation: loss=0.06185, simple_loss=0.05368, pruned_loss=0.005702, audio_tagging_loss=0.02931, over 4681554.00 frames. 2023-11-20 11:09:14,358 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-20 11:09:15,194 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2023-11-20 11:09:27,210 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.003e+01 8.131e+01 8.854e+01 9.762e+01 1.260e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-20 11:09:28,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1062113.3333333333, ans=0.125 2023-11-20 11:09:59,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1062246.6666666667, ans=0.125 2023-11-20 11:10:05,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1062313.3333333333, ans=0.1 2023-11-20 11:10:05,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1062313.3333333333, ans=0.1 2023-11-20 11:10:07,393 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 159350 2023-11-20 11:10:08,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1062313.3333333333, ans=0.125 2023-11-20 11:10:19,214 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 3050, loss[loss=0.08876, simple_loss=0.1066, pruned_loss=0.02424, audio_tagging_loss=0.01124, over 15406.00 frames. ], tot_loss[loss=0.08088, simple_loss=0.1018, pruned_loss=0.01996, audio_tagging_loss=0.01002, over 3046716.73 frames. ], batch size: 57, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:10:32,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1062446.6666666667, ans=0.125 2023-11-20 11:10:37,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1062446.6666666667, ans=0.1 2023-11-20 11:10:56,768 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:11:02,755 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:11:12,511 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 159400 2023-11-20 11:11:18,029 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:11:24,017 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 3100, loss[loss=0.09954, simple_loss=0.1268, pruned_loss=0.02849, audio_tagging_loss=0.007626, over 16003.00 frames. ], tot_loss[loss=0.08091, simple_loss=0.1016, pruned_loss=0.02001, audio_tagging_loss=0.01011, over 3042978.62 frames. ], batch size: 60, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:11:37,697 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.763e+01 7.933e+01 8.636e+01 9.301e+01 1.154e+02, threshold=1.727e+02, percent-clipped=0.0 2023-11-20 11:11:44,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1062780.0, ans=0.125 2023-11-20 11:12:18,298 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 159450 2023-11-20 11:12:29,857 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 3150, loss[loss=0.08747, simple_loss=0.09583, pruned_loss=0.02706, audio_tagging_loss=0.0125, over 15351.00 frames. ], tot_loss[loss=0.08083, simple_loss=0.1013, pruned_loss=0.01995, audio_tagging_loss=0.01024, over 3045217.20 frames. ], batch size: 58, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:13:12,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1063246.6666666667, ans=0.125 2023-11-20 11:13:16,773 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.96 vs. limit=10.0 2023-11-20 11:13:22,284 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.84 vs. limit=22.5 2023-11-20 11:13:24,108 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 159500 2023-11-20 11:13:35,649 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 3200, loss[loss=0.05804, simple_loss=0.0686, pruned_loss=0.01484, audio_tagging_loss=0.008896, over 13808.00 frames. ], tot_loss[loss=0.08059, simple_loss=0.1009, pruned_loss=0.01987, audio_tagging_loss=0.01026, over 3052564.41 frames. ], batch size: 56, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:13:47,901 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.344e+01 9.152e+01 9.986e+01 1.362e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-20 11:13:48,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1063446.6666666667, ans=0.125 2023-11-20 11:13:55,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1063446.6666666667, ans=0.0 2023-11-20 11:13:57,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1063446.6666666667, ans=0.2 2023-11-20 11:14:04,235 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.43 vs. limit=15.0 2023-11-20 11:14:07,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1063513.3333333333, ans=0.0 2023-11-20 11:14:07,443 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=12.0 2023-11-20 11:14:08,619 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=22.5 2023-11-20 11:14:14,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1063580.0, ans=0.1 2023-11-20 11:14:15,688 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2023-11-20 11:14:16,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1063580.0, ans=0.125 2023-11-20 11:14:28,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1063646.6666666667, ans=0.0 2023-11-20 11:14:29,398 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 159550 2023-11-20 11:14:40,185 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 3250, loss[loss=0.06749, simple_loss=0.07384, pruned_loss=0.01967, audio_tagging_loss=0.0109, over 14668.00 frames. ], tot_loss[loss=0.07982, simple_loss=0.09981, pruned_loss=0.01954, audio_tagging_loss=0.01037, over 3043487.08 frames. ], batch size: 58, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:15:02,495 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:15:03,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.46 vs. limit=15.0 2023-11-20 11:15:08,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1063846.6666666667, ans=0.125 2023-11-20 11:15:34,389 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 159600 2023-11-20 11:15:42,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1063980.0, ans=0.125 2023-11-20 11:15:45,710 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 3300, loss[loss=0.08979, simple_loss=0.1162, pruned_loss=0.0239, audio_tagging_loss=0.007797, over 15224.00 frames. ], tot_loss[loss=0.08014, simple_loss=0.1001, pruned_loss=0.01966, audio_tagging_loss=0.01041, over 3048218.03 frames. ], batch size: 54, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:15:55,866 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2023-11-20 11:15:58,827 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.658e+01 7.969e+01 8.807e+01 9.518e+01 1.189e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-20 11:16:01,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1064113.3333333333, ans=0.125 2023-11-20 11:16:08,914 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.50 vs. limit=10.0 2023-11-20 11:16:19,655 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:16:19,837 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:16:30,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1064246.6666666667, ans=0.125 2023-11-20 11:16:39,588 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 159650 2023-11-20 11:16:45,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1064313.3333333333, ans=0.1 2023-11-20 11:16:50,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1064380.0, ans=0.125 2023-11-20 11:16:51,802 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 3350, loss[loss=0.08275, simple_loss=0.1115, pruned_loss=0.02022, audio_tagging_loss=0.006753, over 15390.00 frames. ], tot_loss[loss=0.08034, simple_loss=0.1008, pruned_loss=0.01968, audio_tagging_loss=0.01024, over 3047778.58 frames. ], batch size: 58, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:16:53,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1064380.0, ans=0.2 2023-11-20 11:16:57,597 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2023-11-20 11:17:33,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1064580.0, ans=0.125 2023-11-20 11:17:40,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1064580.0, ans=0.1 2023-11-20 11:17:45,150 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 159700 2023-11-20 11:17:54,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1064646.6666666667, ans=0.025 2023-11-20 11:17:56,240 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 3400, loss[loss=0.04499, simple_loss=0.04965, pruned_loss=0.008847, audio_tagging_loss=0.01132, over 16336.00 frames. ], tot_loss[loss=0.08106, simple_loss=0.1022, pruned_loss=0.01993, audio_tagging_loss=0.01006, over 3048900.44 frames. ], batch size: 62, lr: 4.97e-03, grad_scale: 16.0 2023-11-20 11:17:56,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1064713.3333333333, ans=0.1 2023-11-20 11:18:10,583 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.407e+01 8.209e+01 8.921e+01 9.607e+01 2.745e+02, threshold=1.784e+02, percent-clipped=1.0 2023-11-20 11:18:33,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1064846.6666666667, ans=0.1 2023-11-20 11:18:38,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1064913.3333333333, ans=0.09899494936611666 2023-11-20 11:18:46,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1064913.3333333333, ans=0.0 2023-11-20 11:18:49,748 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 159750 2023-11-20 11:19:01,475 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 3450, loss[loss=0.08869, simple_loss=0.1071, pruned_loss=0.02423, audio_tagging_loss=0.0109, over 15131.00 frames. ], tot_loss[loss=0.08107, simple_loss=0.1022, pruned_loss=0.01998, audio_tagging_loss=0.009994, over 3045855.50 frames. ], batch size: 56, lr: 4.97e-03, grad_scale: 8.0 2023-11-20 11:19:14,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1065113.3333333333, ans=0.1 2023-11-20 11:19:22,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1065113.3333333333, ans=0.07 2023-11-20 11:19:29,954 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.85 vs. limit=22.5 2023-11-20 11:19:35,946 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-11-20 11:19:38,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1065180.0, ans=0.125 2023-11-20 11:19:51,608 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.34 vs. limit=22.5 2023-11-20 11:19:54,736 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 159800 2023-11-20 11:19:57,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1065313.3333333333, ans=0.0 2023-11-20 11:19:59,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1065313.3333333333, ans=0.125 2023-11-20 11:20:07,004 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 3500, loss[loss=0.07221, simple_loss=0.0846, pruned_loss=0.0199, audio_tagging_loss=0.01001, over 14849.00 frames. ], tot_loss[loss=0.08135, simple_loss=0.1026, pruned_loss=0.02014, audio_tagging_loss=0.009913, over 3046334.27 frames. ], batch size: 57, lr: 4.97e-03, grad_scale: 8.0 2023-11-20 11:20:22,599 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.721e+01 8.437e+01 9.163e+01 1.016e+02 1.154e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-20 11:20:40,562 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:21:00,603 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 159850 2023-11-20 11:21:08,559 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.32 vs. limit=12.0 2023-11-20 11:21:11,678 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 3550, loss[loss=0.1053, simple_loss=0.1263, pruned_loss=0.02915, audio_tagging_loss=0.01297, over 15541.00 frames. ], tot_loss[loss=0.08059, simple_loss=0.1014, pruned_loss=0.01993, audio_tagging_loss=0.009979, over 3040901.59 frames. ], batch size: 59, lr: 4.97e-03, grad_scale: 8.0 2023-11-20 11:21:19,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1065713.3333333333, ans=0.025 2023-11-20 11:21:27,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=1065780.0, ans=15.0 2023-11-20 11:21:33,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1065780.0, ans=0.125 2023-11-20 11:21:36,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1065846.6666666667, ans=0.125 2023-11-20 11:21:47,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1065846.6666666667, ans=0.125 2023-11-20 11:21:48,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1065846.6666666667, ans=0.2 2023-11-20 11:22:04,636 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 159900 2023-11-20 11:22:09,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1065980.0, ans=0.125 2023-11-20 11:22:16,488 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 3600, loss[loss=0.05345, simple_loss=0.06479, pruned_loss=0.01202, audio_tagging_loss=0.009039, over 14770.00 frames. ], tot_loss[loss=0.08044, simple_loss=0.1012, pruned_loss=0.01993, audio_tagging_loss=0.00992, over 3039999.53 frames. ], batch size: 57, lr: 4.97e-03, grad_scale: 16.0 2023-11-20 11:22:31,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1066113.3333333333, ans=0.125 2023-11-20 11:22:31,765 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.755e+01 8.370e+01 9.104e+01 9.925e+01 1.510e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-20 11:22:36,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1066113.3333333333, ans=0.0 2023-11-20 11:22:47,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1066180.0, ans=0.125 2023-11-20 11:22:49,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1066180.0, ans=0.0 2023-11-20 11:23:09,946 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 159950 2023-11-20 11:23:11,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1066313.3333333333, ans=0.1 2023-11-20 11:23:21,959 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 3650, loss[loss=0.07242, simple_loss=0.09469, pruned_loss=0.01608, audio_tagging_loss=0.008991, over 15386.00 frames. ], tot_loss[loss=0.0804, simple_loss=0.101, pruned_loss=0.01999, audio_tagging_loss=0.009905, over 3049094.15 frames. ], batch size: 58, lr: 4.97e-03, grad_scale: 16.0 2023-11-20 11:23:36,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1066446.6666666667, ans=0.125 2023-11-20 11:23:37,720 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=22.5 2023-11-20 11:23:41,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1066446.6666666667, ans=0.125 2023-11-20 11:23:54,534 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.58 vs. limit=15.0 2023-11-20 11:23:58,649 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2023-11-20 11:24:12,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1066580.0, ans=0.125 2023-11-20 11:24:16,579 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 160000 2023-11-20 11:24:16,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1066646.6666666667, ans=0.025 2023-11-20 11:24:17,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1066646.6666666667, ans=0.125 2023-11-20 11:24:25,186 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2023-11-20 11:24:28,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1066646.6666666667, ans=0.1 2023-11-20 11:24:31,862 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 3700, loss[loss=0.0745, simple_loss=0.09096, pruned_loss=0.01974, audio_tagging_loss=0.009282, over 15542.00 frames. ], tot_loss[loss=0.08091, simple_loss=0.102, pruned_loss=0.02008, audio_tagging_loss=0.009838, over 3054146.90 frames. ], batch size: 62, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:24:44,953 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:24:47,119 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.803e+01 8.291e+01 8.874e+01 9.793e+01 1.503e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 11:24:52,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1066780.0, ans=0.125 2023-11-20 11:25:25,350 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 160050 2023-11-20 11:25:36,931 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 3750, loss[loss=0.07078, simple_loss=0.08517, pruned_loss=0.01374, audio_tagging_loss=0.01445, over 13979.00 frames. ], tot_loss[loss=0.08117, simple_loss=0.1022, pruned_loss=0.02017, audio_tagging_loss=0.009895, over 3061579.98 frames. ], batch size: 53, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:26:09,067 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2023-11-20 11:26:22,705 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:26:29,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1067313.3333333333, ans=0.05 2023-11-20 11:26:30,172 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 160100 2023-11-20 11:26:30,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1067313.3333333333, ans=0.125 2023-11-20 11:26:35,672 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.68 vs. limit=15.0 2023-11-20 11:26:41,894 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 3800, loss[loss=0.06381, simple_loss=0.07329, pruned_loss=0.01694, audio_tagging_loss=0.01023, over 15088.00 frames. ], tot_loss[loss=0.08086, simple_loss=0.1015, pruned_loss=0.02002, audio_tagging_loss=0.01006, over 3054937.91 frames. ], batch size: 60, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:26:42,403 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.62 vs. limit=15.0 2023-11-20 11:26:44,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2023-11-20 11:26:52,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1067380.0, ans=0.0 2023-11-20 11:26:55,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1067446.6666666667, ans=0.125 2023-11-20 11:26:57,809 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.245e+01 9.019e+01 9.669e+01 1.480e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-20 11:27:03,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1067446.6666666667, ans=0.125 2023-11-20 11:27:05,884 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.34 vs. limit=22.5 2023-11-20 11:27:36,408 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 160150 2023-11-20 11:27:39,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1067646.6666666667, ans=0.2 2023-11-20 11:27:42,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1067646.6666666667, ans=0.1 2023-11-20 11:27:48,123 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 3850, loss[loss=0.0557, simple_loss=0.06893, pruned_loss=0.01003, audio_tagging_loss=0.0112, over 16578.00 frames. ], tot_loss[loss=0.0806, simple_loss=0.101, pruned_loss=0.01991, audio_tagging_loss=0.01018, over 3058841.17 frames. ], batch size: 63, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:27:50,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1067713.3333333333, ans=0.2 2023-11-20 11:28:03,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1067780.0, ans=0.125 2023-11-20 11:28:32,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1067913.3333333333, ans=0.125 2023-11-20 11:28:40,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1067980.0, ans=0.0 2023-11-20 11:28:41,424 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 160200 2023-11-20 11:28:53,407 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 3900, loss[loss=0.1018, simple_loss=0.1212, pruned_loss=0.03045, audio_tagging_loss=0.01077, over 15643.00 frames. ], tot_loss[loss=0.08048, simple_loss=0.1006, pruned_loss=0.01994, audio_tagging_loss=0.01021, over 3052055.38 frames. ], batch size: 56, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:28:54,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1068046.6666666667, ans=0.125 2023-11-20 11:28:58,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1068046.6666666667, ans=0.125 2023-11-20 11:29:02,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1068046.6666666667, ans=0.125 2023-11-20 11:29:02,766 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.43 vs. limit=22.5 2023-11-20 11:29:08,876 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 8.101e+01 8.668e+01 9.712e+01 1.300e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 11:29:24,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1068180.0, ans=0.2 2023-11-20 11:29:46,772 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 160250 2023-11-20 11:29:47,499 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2023-11-20 11:29:51,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1068313.3333333333, ans=15.0 2023-11-20 11:29:58,757 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 3950, loss[loss=0.09071, simple_loss=0.1088, pruned_loss=0.02622, audio_tagging_loss=0.01009, over 14667.00 frames. ], tot_loss[loss=0.08008, simple_loss=0.1002, pruned_loss=0.0198, audio_tagging_loss=0.01019, over 3045727.29 frames. ], batch size: 56, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:30:42,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1068580.0, ans=0.09899494936611666 2023-11-20 11:30:52,491 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 160300 2023-11-20 11:31:04,115 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 4000, loss[loss=0.08693, simple_loss=0.1125, pruned_loss=0.02303, audio_tagging_loss=0.007652, over 15633.00 frames. ], tot_loss[loss=0.08045, simple_loss=0.1007, pruned_loss=0.01977, audio_tagging_loss=0.01032, over 3045356.41 frames. ], batch size: 55, lr: 4.96e-03, grad_scale: 32.0 2023-11-20 11:31:16,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1068780.0, ans=0.1 2023-11-20 11:31:20,246 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.967e+01 8.155e+01 8.816e+01 9.659e+01 1.219e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-20 11:31:22,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=15.0 2023-11-20 11:31:30,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1068846.6666666667, ans=0.125 2023-11-20 11:31:34,342 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.61 vs. limit=15.0 2023-11-20 11:31:57,411 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 160350 2023-11-20 11:32:09,818 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 4050, loss[loss=0.09074, simple_loss=0.1127, pruned_loss=0.02698, audio_tagging_loss=0.007421, over 14916.00 frames. ], tot_loss[loss=0.08086, simple_loss=0.1013, pruned_loss=0.02, audio_tagging_loss=0.01024, over 3047892.98 frames. ], batch size: 55, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:32:10,440 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.04 vs. limit=15.0 2023-11-20 11:32:13,598 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:32:16,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1069046.6666666667, ans=0.125 2023-11-20 11:32:26,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1069113.3333333333, ans=0.0 2023-11-20 11:32:40,933 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:32:53,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.93 vs. limit=22.5 2023-11-20 11:32:59,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1069246.6666666667, ans=0.0 2023-11-20 11:33:02,900 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 160400 2023-11-20 11:33:04,380 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:33:10,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1069313.3333333333, ans=0.125 2023-11-20 11:33:14,088 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 4100, loss[loss=0.06765, simple_loss=0.08384, pruned_loss=0.01468, audio_tagging_loss=0.01105, over 15180.00 frames. ], tot_loss[loss=0.08094, simple_loss=0.1015, pruned_loss=0.01994, audio_tagging_loss=0.01027, over 3044839.76 frames. ], batch size: 56, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:33:23,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1069380.0, ans=0.125 2023-11-20 11:33:29,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1069446.6666666667, ans=0.125 2023-11-20 11:33:31,209 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.500e+01 8.137e+01 8.868e+01 9.537e+01 1.552e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 11:33:49,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1069513.3333333333, ans=0.2 2023-11-20 11:34:00,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1069580.0, ans=0.125 2023-11-20 11:34:04,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1069646.6666666667, ans=0.2 2023-11-20 11:34:07,300 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 160450 2023-11-20 11:34:19,038 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 4150, loss[loss=0.09195, simple_loss=0.1175, pruned_loss=0.02609, audio_tagging_loss=0.00711, over 14091.00 frames. ], tot_loss[loss=0.0805, simple_loss=0.1013, pruned_loss=0.01982, audio_tagging_loss=0.01002, over 3043380.57 frames. ], batch size: 54, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:34:27,883 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.202e-01 2023-11-20 11:34:32,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1069780.0, ans=0.2 2023-11-20 11:34:43,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1069846.6666666667, ans=0.125 2023-11-20 11:34:47,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1069846.6666666667, ans=0.2 2023-11-20 11:35:06,475 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:35:11,514 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 160500 2023-11-20 11:35:20,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1069980.0, ans=0.125 2023-11-20 11:35:22,605 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 4200, loss[loss=0.064, simple_loss=0.08075, pruned_loss=0.01399, audio_tagging_loss=0.009638, over 14772.00 frames. ], tot_loss[loss=0.07927, simple_loss=0.09999, pruned_loss=0.01935, audio_tagging_loss=0.009927, over 3046396.41 frames. ], batch size: 57, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:35:34,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1070046.6666666667, ans=0.05 2023-11-20 11:35:38,155 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:35:40,160 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.812e+01 8.053e+01 8.867e+01 9.480e+01 1.332e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-20 11:36:02,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1070246.6666666667, ans=0.125 2023-11-20 11:36:07,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1070246.6666666667, ans=0.125 2023-11-20 11:36:16,930 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 160550 2023-11-20 11:36:18,679 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.54 vs. limit=12.0 2023-11-20 11:36:28,435 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 4250, loss[loss=0.08338, simple_loss=0.1133, pruned_loss=0.0193, audio_tagging_loss=0.007414, over 15461.00 frames. ], tot_loss[loss=0.07953, simple_loss=0.1006, pruned_loss=0.01946, audio_tagging_loss=0.009779, over 3047562.46 frames. ], batch size: 56, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:36:30,440 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.69 vs. limit=22.5 2023-11-20 11:37:00,256 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.42 vs. limit=15.0 2023-11-20 11:37:01,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1070513.3333333333, ans=0.2 2023-11-20 11:37:22,874 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 160600 2023-11-20 11:37:34,824 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 4300, loss[loss=0.06615, simple_loss=0.07936, pruned_loss=0.0156, audio_tagging_loss=0.01087, over 16362.00 frames. ], tot_loss[loss=0.07903, simple_loss=0.0998, pruned_loss=0.0193, audio_tagging_loss=0.009825, over 3045962.91 frames. ], batch size: 65, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:37:44,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1070713.3333333333, ans=0.125 2023-11-20 11:37:46,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1070780.0, ans=0.2 2023-11-20 11:37:50,508 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.383e+01 9.324e+01 1.005e+02 1.336e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-20 11:37:52,440 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.24 vs. limit=22.5 2023-11-20 11:37:58,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1070846.6666666667, ans=0.125 2023-11-20 11:38:12,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1070913.3333333333, ans=10.0 2023-11-20 11:38:16,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1070913.3333333333, ans=0.125 2023-11-20 11:38:28,442 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 160650 2023-11-20 11:38:39,426 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 4350, loss[loss=0.09516, simple_loss=0.1213, pruned_loss=0.02457, audio_tagging_loss=0.009928, over 15982.00 frames. ], tot_loss[loss=0.07958, simple_loss=0.1004, pruned_loss=0.01958, audio_tagging_loss=0.009783, over 3042291.14 frames. ], batch size: 57, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:38:59,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1071113.3333333333, ans=0.05 2023-11-20 11:39:32,574 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 160700 2023-11-20 11:39:39,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=1071313.3333333333, ans=0.02 2023-11-20 11:39:45,068 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 4400, loss[loss=0.09178, simple_loss=0.1193, pruned_loss=0.02263, audio_tagging_loss=0.009481, over 15898.00 frames. ], tot_loss[loss=0.07987, simple_loss=0.1009, pruned_loss=0.01965, audio_tagging_loss=0.009772, over 3046725.53 frames. ], batch size: 58, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:40:02,272 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.159e+01 8.593e+01 9.338e+01 1.252e+02, threshold=1.719e+02, percent-clipped=0.0 2023-11-20 11:40:13,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1071513.3333333333, ans=0.125 2023-11-20 11:40:38,696 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 160750 2023-11-20 11:40:47,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1071646.6666666667, ans=0.125 2023-11-20 11:40:50,838 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 4450, loss[loss=0.07159, simple_loss=0.08577, pruned_loss=0.01515, audio_tagging_loss=0.01356, over 14500.00 frames. ], tot_loss[loss=0.0801, simple_loss=0.1012, pruned_loss=0.01978, audio_tagging_loss=0.009717, over 3049944.96 frames. ], batch size: 55, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:41:08,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1071780.0, ans=0.2 2023-11-20 11:41:20,448 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.23 vs. limit=15.0 2023-11-20 11:41:33,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-11-20 11:41:36,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1071913.3333333333, ans=0.0 2023-11-20 11:41:40,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1071913.3333333333, ans=0.1 2023-11-20 11:41:44,366 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 160800 2023-11-20 11:41:54,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1071980.0, ans=0.125 2023-11-20 11:41:56,268 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 4500, loss[loss=0.08241, simple_loss=0.1006, pruned_loss=0.02126, audio_tagging_loss=0.01084, over 14591.00 frames. ], tot_loss[loss=0.08086, simple_loss=0.1025, pruned_loss=0.01991, audio_tagging_loss=0.009685, over 3059630.30 frames. ], batch size: 56, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:41:59,214 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.60 vs. limit=10.0 2023-11-20 11:42:12,844 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.114e+01 8.604e+01 9.143e+01 9.908e+01 1.250e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-20 11:42:40,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1072246.6666666667, ans=0.125 2023-11-20 11:42:50,272 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 160850 2023-11-20 11:42:50,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1072313.3333333333, ans=0.125 2023-11-20 11:43:01,955 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 4550, loss[loss=0.08027, simple_loss=0.09856, pruned_loss=0.02027, audio_tagging_loss=0.01073, over 14403.00 frames. ], tot_loss[loss=0.08075, simple_loss=0.1022, pruned_loss=0.01989, audio_tagging_loss=0.00974, over 3056449.34 frames. ], batch size: 54, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:43:30,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1072513.3333333333, ans=0.125 2023-11-20 11:43:40,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1072580.0, ans=0.125 2023-11-20 11:43:42,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1072580.0, ans=0.1 2023-11-20 11:43:42,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1072580.0, ans=0.125 2023-11-20 11:43:48,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1072580.0, ans=0.1 2023-11-20 11:43:53,073 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:43:55,074 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=15.0 2023-11-20 11:43:55,628 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 160900 2023-11-20 11:43:58,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1072646.6666666667, ans=0.125 2023-11-20 11:44:08,153 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 4600, loss[loss=0.08819, simple_loss=0.1235, pruned_loss=0.02001, audio_tagging_loss=0.00643, over 15545.00 frames. ], tot_loss[loss=0.08013, simple_loss=0.1012, pruned_loss=0.01968, audio_tagging_loss=0.009827, over 3053786.33 frames. ], batch size: 55, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:44:24,875 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.602e+01 8.263e+01 8.597e+01 9.248e+01 1.165e+02, threshold=1.719e+02, percent-clipped=0.0 2023-11-20 11:44:28,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1072780.0, ans=0.125 2023-11-20 11:44:38,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1072846.6666666667, ans=0.125 2023-11-20 11:44:49,958 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.57 vs. limit=10.0 2023-11-20 11:44:56,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1072913.3333333333, ans=0.125 2023-11-20 11:45:00,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1072980.0, ans=0.1 2023-11-20 11:45:01,644 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 160950 2023-11-20 11:45:12,504 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 4650, loss[loss=0.08137, simple_loss=0.09513, pruned_loss=0.02081, audio_tagging_loss=0.013, over 15119.00 frames. ], tot_loss[loss=0.08022, simple_loss=0.1009, pruned_loss=0.01972, audio_tagging_loss=0.01004, over 3049295.38 frames. ], batch size: 57, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:45:13,376 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.67 vs. limit=15.0 2023-11-20 11:45:24,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1073113.3333333333, ans=0.125 2023-11-20 11:45:34,730 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2023-11-20 11:45:36,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1073180.0, ans=0.0 2023-11-20 11:46:05,353 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 161000 2023-11-20 11:46:17,329 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 4700, loss[loss=0.06477, simple_loss=0.08253, pruned_loss=0.01443, audio_tagging_loss=0.009072, over 15163.00 frames. ], tot_loss[loss=0.08052, simple_loss=0.1014, pruned_loss=0.01983, audio_tagging_loss=0.01, over 3049881.91 frames. ], batch size: 57, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:46:34,691 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.595e+01 8.445e+01 9.172e+01 1.004e+02 1.426e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-20 11:46:50,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1073513.3333333333, ans=0.0 2023-11-20 11:46:53,796 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:47:01,618 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.80 vs. limit=22.5 2023-11-20 11:47:03,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1073580.0, ans=0.125 2023-11-20 11:47:09,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1073646.6666666667, ans=0.0 2023-11-20 11:47:10,722 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 161050 2023-11-20 11:47:10,904 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:47:20,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1073646.6666666667, ans=0.04949747468305833 2023-11-20 11:47:22,404 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 4750, loss[loss=0.1139, simple_loss=0.1504, pruned_loss=0.0299, audio_tagging_loss=0.008753, over 15617.00 frames. ], tot_loss[loss=0.08047, simple_loss=0.101, pruned_loss=0.01983, audio_tagging_loss=0.01014, over 3044767.96 frames. ], batch size: 55, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:47:23,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1073713.3333333333, ans=0.125 2023-11-20 11:47:46,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.02 vs. limit=15.0 2023-11-20 11:48:00,109 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:48:14,658 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.48 vs. limit=15.0 2023-11-20 11:48:15,301 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 161100 2023-11-20 11:48:27,013 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 4800, loss[loss=0.08131, simple_loss=0.1014, pruned_loss=0.01934, audio_tagging_loss=0.01127, over 15063.00 frames. ], tot_loss[loss=0.08016, simple_loss=0.1005, pruned_loss=0.01968, audio_tagging_loss=0.01021, over 3050091.77 frames. ], batch size: 57, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:48:36,245 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2023-11-20 11:48:45,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1074113.3333333333, ans=0.0 2023-11-20 11:48:45,974 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.153e+01 8.077e+01 9.114e+01 9.869e+01 1.463e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-20 11:48:51,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1074180.0, ans=0.0 2023-11-20 11:49:19,958 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 161150 2023-11-20 11:49:31,617 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 4850, loss[loss=0.08137, simple_loss=0.1051, pruned_loss=0.02182, audio_tagging_loss=0.007001, over 15731.00 frames. ], tot_loss[loss=0.08029, simple_loss=0.1004, pruned_loss=0.01971, audio_tagging_loss=0.0104, over 3055563.09 frames. ], batch size: 58, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:49:31,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1074380.0, ans=0.125 2023-11-20 11:49:37,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1074380.0, ans=0.035 2023-11-20 11:49:37,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1074380.0, ans=0.125 2023-11-20 11:49:52,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.39 vs. limit=15.0 2023-11-20 11:50:03,930 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.70 vs. limit=6.0 2023-11-20 11:50:13,928 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2023-11-20 11:50:25,131 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 161200 2023-11-20 11:50:36,476 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 4900, loss[loss=0.1124, simple_loss=0.1418, pruned_loss=0.03361, audio_tagging_loss=0.007853, over 15398.00 frames. ], tot_loss[loss=0.07988, simple_loss=0.0998, pruned_loss=0.01963, audio_tagging_loss=0.01035, over 3045337.88 frames. ], batch size: 57, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:50:45,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1074713.3333333333, ans=0.2 2023-11-20 11:50:54,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1074780.0, ans=0.0 2023-11-20 11:50:56,973 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.282e+01 8.231e+01 8.987e+01 9.531e+01 1.955e+02, threshold=1.797e+02, percent-clipped=1.0 2023-11-20 11:51:05,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1074846.6666666667, ans=0.2 2023-11-20 11:51:07,622 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.61 vs. limit=10.0 2023-11-20 11:51:14,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1074846.6666666667, ans=0.0 2023-11-20 11:51:29,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1074980.0, ans=0.0 2023-11-20 11:51:30,876 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 161250 2023-11-20 11:51:43,151 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 4950, loss[loss=0.08773, simple_loss=0.1093, pruned_loss=0.02398, audio_tagging_loss=0.009083, over 14911.00 frames. ], tot_loss[loss=0.07974, simple_loss=0.09973, pruned_loss=0.01973, audio_tagging_loss=0.01014, over 3039918.63 frames. ], batch size: 56, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:51:51,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1075046.6666666667, ans=0.125 2023-11-20 11:51:52,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1075046.6666666667, ans=0.0 2023-11-20 11:51:53,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1075046.6666666667, ans=0.1 2023-11-20 11:51:59,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1075113.3333333333, ans=0.0 2023-11-20 11:52:06,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1075113.3333333333, ans=0.125 2023-11-20 11:52:10,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1075180.0, ans=0.0 2023-11-20 11:52:29,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1075246.6666666667, ans=0.0 2023-11-20 11:52:29,822 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:52:36,381 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 161300 2023-11-20 11:52:47,456 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 5000, loss[loss=0.05888, simple_loss=0.06578, pruned_loss=0.01274, audio_tagging_loss=0.01325, over 14597.00 frames. ], tot_loss[loss=0.07938, simple_loss=0.09949, pruned_loss=0.01958, audio_tagging_loss=0.01006, over 3037128.31 frames. ], batch size: 57, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 11:53:07,312 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.854e+01 7.908e+01 8.687e+01 9.464e+01 1.260e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-20 11:53:08,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1075446.6666666667, ans=0.125 2023-11-20 11:53:19,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1075513.3333333333, ans=0.125 2023-11-20 11:53:20,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1075513.3333333333, ans=0.125 2023-11-20 11:53:29,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1075580.0, ans=0.125 2023-11-20 11:53:41,383 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 161350 2023-11-20 11:53:41,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1075646.6666666667, ans=0.0 2023-11-20 11:53:45,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1075646.6666666667, ans=0.1 2023-11-20 11:53:52,363 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 5050, loss[loss=0.08899, simple_loss=0.1068, pruned_loss=0.02755, audio_tagging_loss=0.008026, over 14914.00 frames. ], tot_loss[loss=0.07892, simple_loss=0.09921, pruned_loss=0.01933, audio_tagging_loss=0.009989, over 3041815.29 frames. ], batch size: 57, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 11:54:01,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1075713.3333333333, ans=0.0 2023-11-20 11:54:11,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1075780.0, ans=0.125 2023-11-20 11:54:46,736 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 161400 2023-11-20 11:54:58,806 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 5100, loss[loss=0.0882, simple_loss=0.1082, pruned_loss=0.0217, audio_tagging_loss=0.0124, over 15161.00 frames. ], tot_loss[loss=0.07855, simple_loss=0.09885, pruned_loss=0.01916, audio_tagging_loss=0.009965, over 3043654.92 frames. ], batch size: 54, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 11:55:02,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1076046.6666666667, ans=0.125 2023-11-20 11:55:17,830 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.894e+01 8.531e+01 9.239e+01 1.028e+02 1.398e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-20 11:55:23,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1076180.0, ans=0.07 2023-11-20 11:55:28,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1076180.0, ans=0.0 2023-11-20 11:55:51,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1076313.3333333333, ans=0.0 2023-11-20 11:55:52,464 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 161450 2023-11-20 11:55:52,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1076313.3333333333, ans=0.125 2023-11-20 11:55:54,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1076313.3333333333, ans=0.015 2023-11-20 11:55:59,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1076313.3333333333, ans=0.2 2023-11-20 11:56:01,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1076313.3333333333, ans=0.035 2023-11-20 11:56:04,131 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 5150, loss[loss=0.1076, simple_loss=0.1352, pruned_loss=0.03354, audio_tagging_loss=0.006516, over 15062.00 frames. ], tot_loss[loss=0.07872, simple_loss=0.09933, pruned_loss=0.01918, audio_tagging_loss=0.009877, over 3036794.49 frames. ], batch size: 52, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 11:56:11,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1076380.0, ans=0.1 2023-11-20 11:56:52,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1076580.0, ans=0.0 2023-11-20 11:56:57,398 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 161500 2023-11-20 11:57:01,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1076646.6666666667, ans=0.5 2023-11-20 11:57:05,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1076646.6666666667, ans=0.025 2023-11-20 11:57:09,010 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 5200, loss[loss=0.0876, simple_loss=0.1088, pruned_loss=0.0231, audio_tagging_loss=0.01011, over 16088.00 frames. ], tot_loss[loss=0.07879, simple_loss=0.09959, pruned_loss=0.01915, audio_tagging_loss=0.009841, over 3038186.76 frames. ], batch size: 61, lr: 4.94e-03, grad_scale: 32.0 2023-11-20 11:57:09,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1076713.3333333333, ans=0.125 2023-11-20 11:57:17,606 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2023-11-20 11:57:20,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1076780.0, ans=0.125 2023-11-20 11:57:28,479 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 8.173e+01 8.712e+01 9.541e+01 1.479e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 11:57:41,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1076846.6666666667, ans=0.1 2023-11-20 11:58:01,837 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 161550 2023-11-20 11:58:14,021 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 5250, loss[loss=0.07942, simple_loss=0.1086, pruned_loss=0.0182, audio_tagging_loss=0.006909, over 15610.00 frames. ], tot_loss[loss=0.07857, simple_loss=0.09927, pruned_loss=0.01911, audio_tagging_loss=0.009825, over 3041716.33 frames. ], batch size: 57, lr: 4.94e-03, grad_scale: 32.0 2023-11-20 11:58:33,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1077113.3333333333, ans=0.1 2023-11-20 11:59:06,746 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 161600 2023-11-20 11:59:07,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1077313.3333333333, ans=0.2 2023-11-20 11:59:07,336 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2023-11-20 11:59:09,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1077313.3333333333, ans=0.125 2023-11-20 11:59:18,099 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 5300, loss[loss=0.07202, simple_loss=0.09387, pruned_loss=0.01716, audio_tagging_loss=0.007933, over 14548.00 frames. ], tot_loss[loss=0.07839, simple_loss=0.09925, pruned_loss=0.01905, audio_tagging_loss=0.009716, over 3041125.59 frames. ], batch size: 53, lr: 4.94e-03, grad_scale: 32.0 2023-11-20 11:59:26,465 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:59:38,923 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.714e+01 8.511e+01 9.160e+01 9.855e+01 1.370e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-20 11:59:49,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1077513.3333333333, ans=0.125 2023-11-20 11:59:55,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1077513.3333333333, ans=10.0 2023-11-20 12:00:01,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1077580.0, ans=0.0 2023-11-20 12:00:01,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1077580.0, ans=0.1 2023-11-20 12:00:11,316 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 161650 2023-11-20 12:00:11,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1077646.6666666667, ans=0.0 2023-11-20 12:00:22,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1077713.3333333333, ans=0.125 2023-11-20 12:00:23,459 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 5350, loss[loss=0.06746, simple_loss=0.08624, pruned_loss=0.01606, audio_tagging_loss=0.008274, over 15110.00 frames. ], tot_loss[loss=0.07895, simple_loss=0.09989, pruned_loss=0.01923, audio_tagging_loss=0.009772, over 3038100.38 frames. ], batch size: 56, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 12:00:33,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1077713.3333333333, ans=0.125 2023-11-20 12:00:42,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1077780.0, ans=0.1 2023-11-20 12:00:48,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1077846.6666666667, ans=0.2 2023-11-20 12:01:06,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1077913.3333333333, ans=0.125 2023-11-20 12:01:11,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1077913.3333333333, ans=10.0 2023-11-20 12:01:16,606 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 161700 2023-11-20 12:01:16,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1077980.0, ans=0.1 2023-11-20 12:01:26,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1077980.0, ans=0.0 2023-11-20 12:01:28,387 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 5400, loss[loss=0.08225, simple_loss=0.1064, pruned_loss=0.02173, audio_tagging_loss=0.007342, over 14627.00 frames. ], tot_loss[loss=0.07881, simple_loss=0.09945, pruned_loss=0.01916, audio_tagging_loss=0.009925, over 3040980.14 frames. ], batch size: 55, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 12:01:44,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1078113.3333333333, ans=0.125 2023-11-20 12:01:48,639 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.745e+01 8.017e+01 8.531e+01 9.206e+01 1.102e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-20 12:02:11,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1078246.6666666667, ans=0.0 2023-11-20 12:02:15,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1078246.6666666667, ans=0.025 2023-11-20 12:02:21,404 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 161750 2023-11-20 12:02:25,194 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:02:30,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-20 12:02:32,334 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 5450, loss[loss=0.08564, simple_loss=0.1236, pruned_loss=0.0172, audio_tagging_loss=0.006651, over 14503.00 frames. ], tot_loss[loss=0.07877, simple_loss=0.09938, pruned_loss=0.01917, audio_tagging_loss=0.009913, over 3035466.05 frames. ], batch size: 52, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 12:02:36,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1078380.0, ans=0.125 2023-11-20 12:02:42,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1078380.0, ans=0.125 2023-11-20 12:02:53,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1078446.6666666667, ans=0.0 2023-11-20 12:02:58,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1078513.3333333333, ans=0.125 2023-11-20 12:03:00,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1078513.3333333333, ans=0.0 2023-11-20 12:03:18,260 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.00 vs. limit=10.0 2023-11-20 12:03:25,034 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 161800 2023-11-20 12:03:31,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1078646.6666666667, ans=0.125 2023-11-20 12:03:34,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1078646.6666666667, ans=0.125 2023-11-20 12:03:34,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1078646.6666666667, ans=0.1 2023-11-20 12:03:37,503 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 5500, loss[loss=0.05721, simple_loss=0.0737, pruned_loss=0.008537, audio_tagging_loss=0.01183, over 14905.00 frames. ], tot_loss[loss=0.07898, simple_loss=0.0996, pruned_loss=0.01923, audio_tagging_loss=0.00995, over 3043364.52 frames. ], batch size: 56, lr: 4.94e-03, grad_scale: 8.0 2023-11-20 12:03:41,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1078713.3333333333, ans=0.125 2023-11-20 12:03:48,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1078780.0, ans=0.125 2023-11-20 12:03:49,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1078780.0, ans=0.2 2023-11-20 12:03:54,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1078780.0, ans=0.035 2023-11-20 12:03:59,054 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.875e+01 8.226e+01 8.739e+01 9.564e+01 1.235e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-20 12:04:12,273 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:04:16,534 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.41 vs. limit=15.0 2023-11-20 12:04:26,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1078913.3333333333, ans=0.125 2023-11-20 12:04:30,657 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 161850 2023-11-20 12:04:38,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1078980.0, ans=0.125 2023-11-20 12:04:41,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1079046.6666666667, ans=0.07 2023-11-20 12:04:42,363 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 5550, loss[loss=0.07438, simple_loss=0.09451, pruned_loss=0.0166, audio_tagging_loss=0.01052, over 15073.00 frames. ], tot_loss[loss=0.07911, simple_loss=0.09917, pruned_loss=0.01935, audio_tagging_loss=0.01017, over 3040029.48 frames. ], batch size: 57, lr: 4.94e-03, grad_scale: 8.0 2023-11-20 12:04:49,860 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2023-11-20 12:04:54,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1079113.3333333333, ans=0.125 2023-11-20 12:04:56,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1079113.3333333333, ans=0.125 2023-11-20 12:05:04,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1079113.3333333333, ans=0.125 2023-11-20 12:05:05,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1079113.3333333333, ans=0.0 2023-11-20 12:05:05,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1079113.3333333333, ans=0.0 2023-11-20 12:05:09,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1079180.0, ans=0.0 2023-11-20 12:05:22,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1079246.6666666667, ans=0.125 2023-11-20 12:05:35,653 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 161900 2023-11-20 12:05:42,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1079313.3333333333, ans=0.125 2023-11-20 12:05:47,120 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 5600, loss[loss=0.08148, simple_loss=0.09751, pruned_loss=0.02054, audio_tagging_loss=0.01219, over 15271.00 frames. ], tot_loss[loss=0.07876, simple_loss=0.09864, pruned_loss=0.01922, audio_tagging_loss=0.01021, over 3039791.55 frames. ], batch size: 57, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 12:05:51,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1079380.0, ans=10.0 2023-11-20 12:05:58,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1079446.6666666667, ans=0.125 2023-11-20 12:05:59,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1079446.6666666667, ans=0.09899494936611666 2023-11-20 12:06:09,439 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 8.052e+01 8.920e+01 9.702e+01 1.303e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-20 12:06:15,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1079513.3333333333, ans=0.125 2023-11-20 12:06:33,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1079580.0, ans=0.2 2023-11-20 12:06:34,605 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.90 vs. limit=10.0 2023-11-20 12:06:35,272 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 12:06:40,266 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 161950 2023-11-20 12:06:51,215 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 5650, loss[loss=0.06762, simple_loss=0.08444, pruned_loss=0.01286, audio_tagging_loss=0.01254, over 15091.00 frames. ], tot_loss[loss=0.08002, simple_loss=0.1005, pruned_loss=0.01962, audio_tagging_loss=0.01014, over 3045198.96 frames. ], batch size: 58, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:06:54,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1079713.3333333333, ans=0.2 2023-11-20 12:07:06,501 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.35 vs. limit=12.0 2023-11-20 12:07:27,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1079846.6666666667, ans=0.125 2023-11-20 12:07:38,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1079913.3333333333, ans=0.0 2023-11-20 12:07:44,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1079980.0, ans=0.125 2023-11-20 12:07:45,021 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 162000 2023-11-20 12:07:51,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1079980.0, ans=0.0 2023-11-20 12:07:56,864 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 5700, loss[loss=0.06519, simple_loss=0.08679, pruned_loss=0.01172, audio_tagging_loss=0.01007, over 14885.00 frames. ], tot_loss[loss=0.0798, simple_loss=0.1003, pruned_loss=0.01946, audio_tagging_loss=0.01021, over 3048518.10 frames. ], batch size: 55, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:08:04,217 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.90 vs. limit=15.0 2023-11-20 12:08:13,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1080113.3333333333, ans=0.125 2023-11-20 12:08:18,956 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.645e+01 7.906e+01 8.669e+01 9.570e+01 1.200e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 12:08:26,836 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2023-11-20 12:08:29,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1080180.0, ans=0.0 2023-11-20 12:08:30,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1080180.0, ans=0.0 2023-11-20 12:08:50,300 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 162050 2023-11-20 12:08:55,574 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:09:02,091 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 5750, loss[loss=0.08639, simple_loss=0.102, pruned_loss=0.02651, audio_tagging_loss=0.008898, over 15104.00 frames. ], tot_loss[loss=0.07867, simple_loss=0.09893, pruned_loss=0.01903, audio_tagging_loss=0.01017, over 3046045.30 frames. ], batch size: 57, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:09:24,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1080446.6666666667, ans=0.2 2023-11-20 12:09:55,131 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 162100 2023-11-20 12:10:06,186 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 5800, loss[loss=0.07308, simple_loss=0.08816, pruned_loss=0.01692, audio_tagging_loss=0.01208, over 14852.00 frames. ], tot_loss[loss=0.07833, simple_loss=0.09843, pruned_loss=0.01904, audio_tagging_loss=0.01007, over 3048471.40 frames. ], batch size: 59, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:10:06,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1080713.3333333333, ans=0.0 2023-11-20 12:10:23,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.28 vs. limit=22.5 2023-11-20 12:10:28,888 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.720e+01 8.138e+01 8.619e+01 9.335e+01 1.829e+02, threshold=1.724e+02, percent-clipped=1.0 2023-11-20 12:10:34,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1080846.6666666667, ans=0.2 2023-11-20 12:10:45,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=1080913.3333333333, ans=6.0 2023-11-20 12:10:47,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1080913.3333333333, ans=0.125 2023-11-20 12:10:59,975 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 162150 2023-11-20 12:11:11,072 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 5850, loss[loss=0.07744, simple_loss=0.1009, pruned_loss=0.0178, audio_tagging_loss=0.009203, over 16525.00 frames. ], tot_loss[loss=0.07864, simple_loss=0.09902, pruned_loss=0.01918, audio_tagging_loss=0.009945, over 3041264.19 frames. ], batch size: 59, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:11:27,301 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.16 vs. limit=15.0 2023-11-20 12:11:29,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1081113.3333333333, ans=0.025 2023-11-20 12:11:31,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1081113.3333333333, ans=0.125 2023-11-20 12:11:37,336 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.08 vs. limit=15.0 2023-11-20 12:11:45,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1081180.0, ans=0.0 2023-11-20 12:11:46,641 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.77 vs. limit=12.0 2023-11-20 12:12:04,347 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 162200 2023-11-20 12:12:16,963 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 5900, loss[loss=0.1022, simple_loss=0.132, pruned_loss=0.02609, audio_tagging_loss=0.01012, over 14675.00 frames. ], tot_loss[loss=0.07862, simple_loss=0.09938, pruned_loss=0.01907, audio_tagging_loss=0.009861, over 3050198.84 frames. ], batch size: 52, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:12:38,401 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.547e+01 7.908e+01 8.564e+01 9.521e+01 1.124e+02, threshold=1.713e+02, percent-clipped=0.0 2023-11-20 12:12:48,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=1081513.3333333333, ans=0.02 2023-11-20 12:12:51,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1081513.3333333333, ans=0.0 2023-11-20 12:12:54,723 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.70 vs. limit=10.0 2023-11-20 12:13:01,423 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.58 vs. limit=12.0 2023-11-20 12:13:10,406 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 162250 2023-11-20 12:13:21,387 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 5950, loss[loss=0.08417, simple_loss=0.1072, pruned_loss=0.02205, audio_tagging_loss=0.008508, over 15982.00 frames. ], tot_loss[loss=0.07857, simple_loss=0.09936, pruned_loss=0.01899, audio_tagging_loss=0.009905, over 3054569.10 frames. ], batch size: 59, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:13:23,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1081713.3333333333, ans=0.125 2023-11-20 12:13:32,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1081713.3333333333, ans=0.2 2023-11-20 12:13:42,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1081780.0, ans=0.125 2023-11-20 12:13:52,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1081846.6666666667, ans=0.1 2023-11-20 12:13:52,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1081846.6666666667, ans=0.0 2023-11-20 12:13:52,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1081846.6666666667, ans=0.1 2023-11-20 12:14:05,346 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.81 vs. limit=6.0 2023-11-20 12:14:14,701 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 162300 2023-11-20 12:14:14,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1081980.0, ans=0.125 2023-11-20 12:14:16,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1081980.0, ans=0.125 2023-11-20 12:14:26,372 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 6000, loss[loss=0.1044, simple_loss=0.1339, pruned_loss=0.02852, audio_tagging_loss=0.008981, over 15422.00 frames. ], tot_loss[loss=0.07864, simple_loss=0.09961, pruned_loss=0.01897, audio_tagging_loss=0.009866, over 3056307.53 frames. ], batch size: 57, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:14:26,373 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-20 12:15:09,605 INFO [train_asr.py:1294] (3/4) Epoch 14, validation: loss=0.06225, simple_loss=0.05354, pruned_loss=0.005677, audio_tagging_loss=0.0298, over 4681554.00 frames. 2023-11-20 12:15:09,606 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-20 12:15:09,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1082046.6666666667, ans=0.125 2023-11-20 12:15:09,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1082046.6666666667, ans=0.1 2023-11-20 12:15:17,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1082046.6666666667, ans=0.0 2023-11-20 12:15:29,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1082113.3333333333, ans=10.0 2023-11-20 12:15:31,321 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.652e+01 8.215e+01 8.669e+01 9.712e+01 1.545e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 12:15:33,150 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.47 vs. limit=22.5 2023-11-20 12:15:36,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1082180.0, ans=0.0 2023-11-20 12:15:56,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1082246.6666666667, ans=0.0 2023-11-20 12:15:58,718 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 12:16:02,691 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 162350 2023-11-20 12:16:03,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.37 vs. limit=12.0 2023-11-20 12:16:13,767 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 6050, loss[loss=0.058, simple_loss=0.07012, pruned_loss=0.01116, audio_tagging_loss=0.01177, over 15018.00 frames. ], tot_loss[loss=0.07801, simple_loss=0.09859, pruned_loss=0.01872, audio_tagging_loss=0.009995, over 3053332.96 frames. ], batch size: 56, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:16:36,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1082446.6666666667, ans=0.2 2023-11-20 12:16:44,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1082513.3333333333, ans=0.125 2023-11-20 12:16:51,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1082513.3333333333, ans=0.07 2023-11-20 12:16:54,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1082580.0, ans=0.125 2023-11-20 12:17:07,548 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 162400 2023-11-20 12:17:09,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1082646.6666666667, ans=0.0 2023-11-20 12:17:18,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1082713.3333333333, ans=0.0 2023-11-20 12:17:19,041 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 6100, loss[loss=0.07053, simple_loss=0.09075, pruned_loss=0.01763, audio_tagging_loss=0.007529, over 15310.00 frames. ], tot_loss[loss=0.07859, simple_loss=0.09925, pruned_loss=0.01901, audio_tagging_loss=0.009962, over 3054484.81 frames. ], batch size: 58, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:17:20,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1082713.3333333333, ans=0.125 2023-11-20 12:17:41,812 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.187e+01 8.098e+01 8.908e+01 9.804e+01 1.147e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-20 12:17:42,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1082780.0, ans=0.125 2023-11-20 12:17:47,569 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.25 vs. limit=22.5 2023-11-20 12:18:03,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1082913.3333333333, ans=0.125 2023-11-20 12:18:09,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.81 vs. limit=10.0 2023-11-20 12:18:12,880 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 162450 2023-11-20 12:18:24,018 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 6150, loss[loss=0.08751, simple_loss=0.112, pruned_loss=0.02405, audio_tagging_loss=0.007443, over 13802.00 frames. ], tot_loss[loss=0.07912, simple_loss=0.1001, pruned_loss=0.0192, audio_tagging_loss=0.009852, over 3050264.18 frames. ], batch size: 53, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:18:43,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1083113.3333333333, ans=0.0 2023-11-20 12:18:54,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1083180.0, ans=0.125 2023-11-20 12:19:04,361 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2023-11-20 12:19:16,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1083313.3333333333, ans=0.125 2023-11-20 12:19:16,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1083313.3333333333, ans=0.2 2023-11-20 12:19:17,354 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 162500 2023-11-20 12:19:26,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1083313.3333333333, ans=0.0 2023-11-20 12:19:29,087 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 6200, loss[loss=0.04815, simple_loss=0.06173, pruned_loss=0.005937, audio_tagging_loss=0.01135, over 16864.00 frames. ], tot_loss[loss=0.07999, simple_loss=0.1011, pruned_loss=0.0195, audio_tagging_loss=0.009919, over 3055231.05 frames. ], batch size: 64, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:19:35,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1083380.0, ans=0.035 2023-11-20 12:19:41,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1083446.6666666667, ans=0.1 2023-11-20 12:19:49,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1083446.6666666667, ans=0.2 2023-11-20 12:19:51,145 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+01 8.194e+01 8.727e+01 9.528e+01 1.309e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-20 12:20:05,607 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:20:15,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1083580.0, ans=0.125 2023-11-20 12:20:18,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1083580.0, ans=0.125 2023-11-20 12:20:21,976 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 162550 2023-11-20 12:20:33,716 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 6250, loss[loss=0.09112, simple_loss=0.1153, pruned_loss=0.02321, audio_tagging_loss=0.01028, over 15264.00 frames. ], tot_loss[loss=0.07975, simple_loss=0.1006, pruned_loss=0.01938, audio_tagging_loss=0.01004, over 3052288.17 frames. ], batch size: 57, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:20:36,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1083713.3333333333, ans=0.1 2023-11-20 12:20:44,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1083713.3333333333, ans=0.09899494936611666 2023-11-20 12:20:56,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1083780.0, ans=0.1 2023-11-20 12:20:56,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1083780.0, ans=0.125 2023-11-20 12:21:01,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1083846.6666666667, ans=0.2 2023-11-20 12:21:08,267 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.76 vs. limit=15.0 2023-11-20 12:21:19,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1083913.3333333333, ans=0.125 2023-11-20 12:21:26,682 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 162600 2023-11-20 12:21:33,787 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.07 vs. limit=10.0 2023-11-20 12:21:37,972 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 6300, loss[loss=0.06946, simple_loss=0.08194, pruned_loss=0.01712, audio_tagging_loss=0.01138, over 15355.00 frames. ], tot_loss[loss=0.07991, simple_loss=0.1009, pruned_loss=0.01943, audio_tagging_loss=0.01004, over 3058025.33 frames. ], batch size: 59, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:22:00,273 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.702e+01 8.158e+01 9.011e+01 9.819e+01 1.577e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-20 12:22:32,228 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 162650 2023-11-20 12:22:43,817 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 6350, loss[loss=0.06038, simple_loss=0.06948, pruned_loss=0.0131, audio_tagging_loss=0.01254, over 15839.00 frames. ], tot_loss[loss=0.07999, simple_loss=0.1009, pruned_loss=0.01944, audio_tagging_loss=0.0101, over 3062150.47 frames. ], batch size: 60, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:23:12,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.84 vs. limit=22.5 2023-11-20 12:23:20,497 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.24 vs. limit=15.0 2023-11-20 12:23:33,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1084646.6666666667, ans=0.1 2023-11-20 12:23:36,504 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 162700 2023-11-20 12:23:45,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1084646.6666666667, ans=0.2 2023-11-20 12:23:48,120 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 6400, loss[loss=0.07118, simple_loss=0.08944, pruned_loss=0.01456, audio_tagging_loss=0.0119, over 15916.00 frames. ], tot_loss[loss=0.07993, simple_loss=0.1005, pruned_loss=0.01952, audio_tagging_loss=0.01017, over 3057846.11 frames. ], batch size: 59, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:23:55,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1084713.3333333333, ans=0.1 2023-11-20 12:24:02,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1084780.0, ans=0.1 2023-11-20 12:24:10,372 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.846e+01 8.159e+01 8.686e+01 9.396e+01 1.221e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-20 12:24:14,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1084846.6666666667, ans=0.0 2023-11-20 12:24:24,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1084846.6666666667, ans=0.07 2023-11-20 12:24:40,967 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 162750 2023-11-20 12:24:52,691 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 6450, loss[loss=0.0937, simple_loss=0.1206, pruned_loss=0.02264, audio_tagging_loss=0.01077, over 15729.00 frames. ], tot_loss[loss=0.07996, simple_loss=0.1004, pruned_loss=0.01954, audio_tagging_loss=0.0102, over 3048580.47 frames. ], batch size: 56, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:25:14,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1085113.3333333333, ans=0.125 2023-11-20 12:25:23,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1085180.0, ans=0.0 2023-11-20 12:25:30,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1085246.6666666667, ans=0.125 2023-11-20 12:25:37,362 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:25:45,800 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 162800 2023-11-20 12:25:57,220 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 6500, loss[loss=0.08218, simple_loss=0.1052, pruned_loss=0.02062, audio_tagging_loss=0.008983, over 14415.00 frames. ], tot_loss[loss=0.08026, simple_loss=0.1011, pruned_loss=0.01965, audio_tagging_loss=0.01008, over 3045142.71 frames. ], batch size: 57, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:26:18,886 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.899e+01 8.195e+01 8.750e+01 9.288e+01 1.237e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 12:26:26,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1085513.3333333333, ans=0.1 2023-11-20 12:26:46,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1085580.0, ans=0.125 2023-11-20 12:26:49,743 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 162850 2023-11-20 12:26:58,045 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2023-11-20 12:26:59,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1085646.6666666667, ans=0.125 2023-11-20 12:27:01,699 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 6550, loss[loss=0.08809, simple_loss=0.1057, pruned_loss=0.02688, audio_tagging_loss=0.008347, over 14574.00 frames. ], tot_loss[loss=0.08016, simple_loss=0.1011, pruned_loss=0.01962, audio_tagging_loss=0.009982, over 3046376.16 frames. ], batch size: 54, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:27:06,362 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=15.01 vs. limit=15.0 2023-11-20 12:27:15,764 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.67 vs. limit=8.0 2023-11-20 12:27:23,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=1085780.0, ans=0.2 2023-11-20 12:27:27,068 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.69 vs. limit=15.0 2023-11-20 12:27:27,281 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.33 vs. limit=15.0 2023-11-20 12:27:54,761 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 162900 2023-11-20 12:28:06,271 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 6600, loss[loss=0.06025, simple_loss=0.07064, pruned_loss=0.01142, audio_tagging_loss=0.01351, over 14992.00 frames. ], tot_loss[loss=0.07932, simple_loss=0.1002, pruned_loss=0.01933, audio_tagging_loss=0.00987, over 3042181.99 frames. ], batch size: 57, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:28:16,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1086046.6666666667, ans=0.0 2023-11-20 12:28:18,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1086113.3333333333, ans=0.125 2023-11-20 12:28:27,638 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.865e+01 8.102e+01 8.649e+01 9.372e+01 1.515e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-20 12:28:59,034 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 162950 2023-11-20 12:29:08,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1086313.3333333333, ans=0.0 2023-11-20 12:29:10,577 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 6650, loss[loss=0.0955, simple_loss=0.1235, pruned_loss=0.02608, audio_tagging_loss=0.007661, over 14738.00 frames. ], tot_loss[loss=0.07983, simple_loss=0.101, pruned_loss=0.01962, audio_tagging_loss=0.009705, over 3040611.96 frames. ], batch size: 54, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:29:13,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1086380.0, ans=0.2 2023-11-20 12:29:15,115 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.12 vs. limit=10.0 2023-11-20 12:29:26,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1086446.6666666667, ans=0.125 2023-11-20 12:29:39,716 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.54 vs. limit=12.0 2023-11-20 12:29:44,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1086513.3333333333, ans=0.125 2023-11-20 12:29:47,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1086513.3333333333, ans=0.125 2023-11-20 12:29:53,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1086580.0, ans=0.125 2023-11-20 12:30:03,772 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 163000 2023-11-20 12:30:13,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1086646.6666666667, ans=0.1 2023-11-20 12:30:14,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1086713.3333333333, ans=0.125 2023-11-20 12:30:15,798 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 6700, loss[loss=0.09054, simple_loss=0.118, pruned_loss=0.0224, audio_tagging_loss=0.009141, over 14635.00 frames. ], tot_loss[loss=0.07922, simple_loss=0.1004, pruned_loss=0.01933, audio_tagging_loss=0.0097, over 3038523.04 frames. ], batch size: 54, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:30:18,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1086713.3333333333, ans=0.125 2023-11-20 12:30:26,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1086713.3333333333, ans=0.0 2023-11-20 12:30:27,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1086780.0, ans=0.1 2023-11-20 12:30:35,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.27 vs. limit=10.0 2023-11-20 12:30:37,677 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.914e+01 8.088e+01 8.794e+01 9.537e+01 1.389e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 12:30:50,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1086846.6666666667, ans=0.125 2023-11-20 12:31:02,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1086913.3333333333, ans=0.5 2023-11-20 12:31:08,496 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 163050 2023-11-20 12:31:16,861 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.68 vs. limit=15.0 2023-11-20 12:31:20,542 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 6750, loss[loss=0.06692, simple_loss=0.09469, pruned_loss=0.0138, audio_tagging_loss=0.005771, over 14742.00 frames. ], tot_loss[loss=0.07932, simple_loss=0.1005, pruned_loss=0.01933, audio_tagging_loss=0.009722, over 3033567.48 frames. ], batch size: 54, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:31:20,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1087046.6666666667, ans=0.125 2023-11-20 12:31:23,784 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2023-11-20 12:31:36,343 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.02 vs. limit=10.0 2023-11-20 12:31:37,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1087113.3333333333, ans=0.125 2023-11-20 12:32:13,362 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 163100 2023-11-20 12:32:17,642 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.06 vs. limit=15.0 2023-11-20 12:32:24,816 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 6800, loss[loss=0.08044, simple_loss=0.09749, pruned_loss=0.02102, audio_tagging_loss=0.01068, over 14237.00 frames. ], tot_loss[loss=0.07921, simple_loss=0.1001, pruned_loss=0.01934, audio_tagging_loss=0.009805, over 3029580.79 frames. ], batch size: 56, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:32:26,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1087380.0, ans=0.2 2023-11-20 12:32:45,883 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.532e+01 7.924e+01 8.642e+01 9.401e+01 1.270e+02, threshold=1.728e+02, percent-clipped=0.0 2023-11-20 12:32:51,446 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2023-11-20 12:33:01,243 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=12.0 2023-11-20 12:33:02,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1087580.0, ans=0.1 2023-11-20 12:33:17,158 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 163150 2023-11-20 12:33:21,355 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.48 vs. limit=22.5 2023-11-20 12:33:28,581 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 6850, loss[loss=0.05934, simple_loss=0.07464, pruned_loss=0.01169, audio_tagging_loss=0.01033, over 15095.00 frames. ], tot_loss[loss=0.07816, simple_loss=0.09859, pruned_loss=0.01897, audio_tagging_loss=0.009893, over 3027640.73 frames. ], batch size: 56, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:33:29,197 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2023-11-20 12:33:32,824 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.31 vs. limit=12.0 2023-11-20 12:33:43,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1087780.0, ans=0.1 2023-11-20 12:33:48,278 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.43 vs. limit=22.5 2023-11-20 12:33:53,346 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.30 vs. limit=15.0 2023-11-20 12:34:07,594 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.18 vs. limit=22.5 2023-11-20 12:34:10,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1087913.3333333333, ans=0.2 2023-11-20 12:34:17,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1087913.3333333333, ans=0.2 2023-11-20 12:34:18,270 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2023-11-20 12:34:21,375 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 163200 2023-11-20 12:34:22,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=1087980.0, ans=15.0 2023-11-20 12:34:32,697 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 6900, loss[loss=0.1032, simple_loss=0.1245, pruned_loss=0.03025, audio_tagging_loss=0.01069, over 15839.00 frames. ], tot_loss[loss=0.07862, simple_loss=0.09918, pruned_loss=0.01911, audio_tagging_loss=0.009916, over 3036385.33 frames. ], batch size: 58, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:34:51,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1088113.3333333333, ans=0.1 2023-11-20 12:34:55,159 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.594e+01 8.124e+01 8.683e+01 9.436e+01 1.192e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-20 12:35:08,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1088180.0, ans=0.125 2023-11-20 12:35:24,859 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 12:35:26,120 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 163250 2023-11-20 12:35:38,405 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 6950, loss[loss=0.07696, simple_loss=0.1004, pruned_loss=0.01724, audio_tagging_loss=0.009509, over 16966.00 frames. ], tot_loss[loss=0.07862, simple_loss=0.09916, pruned_loss=0.0191, audio_tagging_loss=0.009946, over 3041600.58 frames. ], batch size: 63, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:35:39,173 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=12.0 2023-11-20 12:36:00,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1088446.6666666667, ans=0.2 2023-11-20 12:36:23,783 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.93 vs. limit=15.0 2023-11-20 12:36:31,636 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 163300 2023-11-20 12:36:31,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1088646.6666666667, ans=0.125 2023-11-20 12:36:42,515 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 7000, loss[loss=0.09969, simple_loss=0.1276, pruned_loss=0.02733, audio_tagging_loss=0.008552, over 15559.00 frames. ], tot_loss[loss=0.07928, simple_loss=0.09994, pruned_loss=0.01941, audio_tagging_loss=0.0099, over 3042889.41 frames. ], batch size: 57, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:36:45,846 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.40 vs. limit=22.5 2023-11-20 12:36:57,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.86 vs. limit=12.0 2023-11-20 12:36:58,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1088780.0, ans=0.2 2023-11-20 12:37:04,380 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.412e+01 8.025e+01 8.662e+01 9.457e+01 1.125e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-20 12:37:04,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1088780.0, ans=0.125 2023-11-20 12:37:07,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1088846.6666666667, ans=0.125 2023-11-20 12:37:08,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1088846.6666666667, ans=0.0 2023-11-20 12:37:24,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1088913.3333333333, ans=10.0 2023-11-20 12:37:27,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1088913.3333333333, ans=0.1 2023-11-20 12:37:35,892 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 163350 2023-11-20 12:37:42,334 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.77 vs. limit=15.0 2023-11-20 12:37:46,701 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 7050, loss[loss=0.06891, simple_loss=0.08107, pruned_loss=0.01525, audio_tagging_loss=0.01312, over 14897.00 frames. ], tot_loss[loss=0.0792, simple_loss=0.09986, pruned_loss=0.01927, audio_tagging_loss=0.009996, over 3048154.32 frames. ], batch size: 58, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:37:55,316 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.73 vs. limit=12.0 2023-11-20 12:38:30,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1089246.6666666667, ans=0.125 2023-11-20 12:38:34,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1089246.6666666667, ans=0.125 2023-11-20 12:38:39,743 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 163400 2023-11-20 12:38:42,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1089313.3333333333, ans=0.1 2023-11-20 12:38:45,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1089313.3333333333, ans=0.125 2023-11-20 12:38:49,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1089313.3333333333, ans=0.0 2023-11-20 12:38:51,912 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 7100, loss[loss=0.07196, simple_loss=0.08453, pruned_loss=0.01913, audio_tagging_loss=0.01057, over 14286.00 frames. ], tot_loss[loss=0.07872, simple_loss=0.09905, pruned_loss=0.01912, audio_tagging_loss=0.01008, over 3042689.00 frames. ], batch size: 56, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:38:58,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1089380.0, ans=0.125 2023-11-20 12:39:10,288 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2023-11-20 12:39:12,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1089446.6666666667, ans=0.125 2023-11-20 12:39:14,031 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.595e+01 8.043e+01 8.663e+01 9.375e+01 1.240e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 12:39:45,231 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 163450 2023-11-20 12:39:56,003 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 7150, loss[loss=0.06838, simple_loss=0.0866, pruned_loss=0.01549, audio_tagging_loss=0.009584, over 15860.00 frames. ], tot_loss[loss=0.07885, simple_loss=0.09912, pruned_loss=0.01917, audio_tagging_loss=0.01012, over 3046424.88 frames. ], batch size: 60, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:39:56,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1089713.3333333333, ans=0.125 2023-11-20 12:40:03,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1089713.3333333333, ans=0.1 2023-11-20 12:40:03,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=1089713.3333333333, ans=12.0 2023-11-20 12:40:10,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1089780.0, ans=0.125 2023-11-20 12:40:14,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1089780.0, ans=0.0 2023-11-20 12:40:16,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1089780.0, ans=0.0 2023-11-20 12:40:25,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1089846.6666666667, ans=0.035 2023-11-20 12:40:38,304 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.85 vs. limit=10.0 2023-11-20 12:40:48,635 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 163500 2023-11-20 12:41:00,077 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 7200, loss[loss=0.06434, simple_loss=0.07427, pruned_loss=0.0165, audio_tagging_loss=0.0107, over 14621.00 frames. ], tot_loss[loss=0.07962, simple_loss=0.1002, pruned_loss=0.01937, audio_tagging_loss=0.01014, over 3048614.08 frames. ], batch size: 56, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:41:09,741 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=22.5 2023-11-20 12:41:11,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1090113.3333333333, ans=0.125 2023-11-20 12:41:23,957 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.741e+01 8.075e+01 8.632e+01 9.241e+01 3.399e+02, threshold=1.726e+02, percent-clipped=1.0 2023-11-20 12:41:53,238 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 163550 2023-11-20 12:41:53,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1090313.3333333333, ans=0.0 2023-11-20 12:42:00,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1090313.3333333333, ans=0.125 2023-11-20 12:42:04,797 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 7250, loss[loss=0.06883, simple_loss=0.07994, pruned_loss=0.01629, audio_tagging_loss=0.01257, over 14668.00 frames. ], tot_loss[loss=0.07995, simple_loss=0.1005, pruned_loss=0.01951, audio_tagging_loss=0.01018, over 3051371.50 frames. ], batch size: 57, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:42:42,663 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.27 vs. limit=22.5 2023-11-20 12:42:47,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1090580.0, ans=0.125 2023-11-20 12:42:57,796 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 163600 2023-11-20 12:43:00,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1090646.6666666667, ans=0.125 2023-11-20 12:43:00,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1090646.6666666667, ans=0.125 2023-11-20 12:43:09,573 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 7300, loss[loss=0.06037, simple_loss=0.07145, pruned_loss=0.01191, audio_tagging_loss=0.01274, over 14430.00 frames. ], tot_loss[loss=0.0791, simple_loss=0.0996, pruned_loss=0.01922, audio_tagging_loss=0.01008, over 3045932.05 frames. ], batch size: 56, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:43:28,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1090780.0, ans=0.125 2023-11-20 12:43:31,928 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.842e+01 8.253e+01 8.937e+01 9.591e+01 1.343e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 12:43:36,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1090846.6666666667, ans=0.0 2023-11-20 12:43:39,185 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=15.0 2023-11-20 12:44:01,930 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 163650 2023-11-20 12:44:02,571 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.71 vs. limit=22.5 2023-11-20 12:44:06,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1090980.0, ans=0.0 2023-11-20 12:44:12,806 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.72 vs. limit=15.0 2023-11-20 12:44:13,398 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 7350, loss[loss=0.0802, simple_loss=0.1077, pruned_loss=0.01934, audio_tagging_loss=0.007013, over 15534.00 frames. ], tot_loss[loss=0.0788, simple_loss=0.09957, pruned_loss=0.01913, audio_tagging_loss=0.009878, over 3044630.00 frames. ], batch size: 58, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:44:14,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1091046.6666666667, ans=0.1 2023-11-20 12:44:15,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1091046.6666666667, ans=0.125 2023-11-20 12:44:23,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1091046.6666666667, ans=0.125 2023-11-20 12:44:31,423 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.27 vs. limit=15.0 2023-11-20 12:44:39,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1091180.0, ans=0.0 2023-11-20 12:44:40,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1091180.0, ans=0.125 2023-11-20 12:44:42,659 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:44:50,856 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.87 vs. limit=10.0 2023-11-20 12:45:05,592 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 163700 2023-11-20 12:45:17,054 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 7400, loss[loss=0.08464, simple_loss=0.1029, pruned_loss=0.02187, audio_tagging_loss=0.01133, over 14891.00 frames. ], tot_loss[loss=0.07876, simple_loss=0.0995, pruned_loss=0.01919, audio_tagging_loss=0.009824, over 3042444.39 frames. ], batch size: 57, lr: 4.91e-03, grad_scale: 16.0 2023-11-20 12:45:25,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1091380.0, ans=0.05 2023-11-20 12:45:41,939 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.771e+01 8.379e+01 9.065e+01 9.616e+01 1.278e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-20 12:45:44,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1091513.3333333333, ans=0.07 2023-11-20 12:46:02,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1091580.0, ans=0.0 2023-11-20 12:46:08,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1091646.6666666667, ans=0.125 2023-11-20 12:46:10,333 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 163750 2023-11-20 12:46:21,957 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 7450, loss[loss=0.1134, simple_loss=0.1501, pruned_loss=0.03099, audio_tagging_loss=0.007361, over 15305.00 frames. ], tot_loss[loss=0.07898, simple_loss=0.09977, pruned_loss=0.01934, audio_tagging_loss=0.00975, over 3047499.65 frames. ], batch size: 55, lr: 4.91e-03, grad_scale: 16.0 2023-11-20 12:46:35,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1091780.0, ans=0.0 2023-11-20 12:46:47,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1091846.6666666667, ans=0.1 2023-11-20 12:47:09,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1091913.3333333333, ans=0.0 2023-11-20 12:47:14,951 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 163800 2023-11-20 12:47:27,445 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 7500, loss[loss=0.07347, simple_loss=0.09124, pruned_loss=0.01621, audio_tagging_loss=0.01164, over 15056.00 frames. ], tot_loss[loss=0.07892, simple_loss=0.09956, pruned_loss=0.01936, audio_tagging_loss=0.009784, over 3045623.87 frames. ], batch size: 56, lr: 4.91e-03, grad_scale: 16.0 2023-11-20 12:47:28,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1092046.6666666667, ans=0.025 2023-11-20 12:47:40,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1092113.3333333333, ans=0.0 2023-11-20 12:47:43,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1092113.3333333333, ans=0.0 2023-11-20 12:47:51,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2023-11-20 12:47:51,399 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.921e+01 8.205e+01 8.885e+01 9.811e+01 1.439e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-20 12:48:08,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1092246.6666666667, ans=0.0 2023-11-20 12:48:19,233 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 163850 2023-11-20 12:48:30,692 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 7550, loss[loss=0.08589, simple_loss=0.1205, pruned_loss=0.0199, audio_tagging_loss=0.00575, over 14859.00 frames. ], tot_loss[loss=0.07914, simple_loss=0.09989, pruned_loss=0.01939, audio_tagging_loss=0.00981, over 3042804.60 frames. ], batch size: 56, lr: 4.91e-03, grad_scale: 16.0 2023-11-20 12:48:48,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1092446.6666666667, ans=0.125 2023-11-20 12:48:53,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1092446.6666666667, ans=0.125 2023-11-20 12:49:11,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1092580.0, ans=0.125 2023-11-20 12:49:21,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1092646.6666666667, ans=0.2 2023-11-20 12:49:23,460 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 163900 2023-11-20 12:49:26,679 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.57 vs. limit=22.5 2023-11-20 12:49:29,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1092646.6666666667, ans=0.125 2023-11-20 12:49:34,871 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 7600, loss[loss=0.09455, simple_loss=0.1211, pruned_loss=0.02349, audio_tagging_loss=0.01053, over 16171.00 frames. ], tot_loss[loss=0.07895, simple_loss=0.09948, pruned_loss=0.01939, audio_tagging_loss=0.009827, over 3046681.72 frames. ], batch size: 58, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:49:39,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1092713.3333333333, ans=0.0 2023-11-20 12:49:41,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2023-11-20 12:49:59,226 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.781e+01 8.009e+01 8.543e+01 9.202e+01 1.104e+02, threshold=1.709e+02, percent-clipped=0.0 2023-11-20 12:50:03,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1092846.6666666667, ans=0.125 2023-11-20 12:50:12,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2023-11-20 12:50:27,538 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 163950 2023-11-20 12:50:31,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1092980.0, ans=0.125 2023-11-20 12:50:39,683 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 7650, loss[loss=0.08779, simple_loss=0.108, pruned_loss=0.02404, audio_tagging_loss=0.009778, over 14996.00 frames. ], tot_loss[loss=0.07878, simple_loss=0.09922, pruned_loss=0.01936, audio_tagging_loss=0.009812, over 3037154.24 frames. ], batch size: 56, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:50:41,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1093046.6666666667, ans=0.125 2023-11-20 12:51:01,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1093113.3333333333, ans=0.1 2023-11-20 12:51:17,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1093246.6666666667, ans=0.125 2023-11-20 12:51:31,783 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 164000 2023-11-20 12:51:42,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1093313.3333333333, ans=0.125 2023-11-20 12:51:47,525 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 7700, loss[loss=0.07519, simple_loss=0.09657, pruned_loss=0.0171, audio_tagging_loss=0.009798, over 13820.00 frames. ], tot_loss[loss=0.07848, simple_loss=0.09877, pruned_loss=0.01922, audio_tagging_loss=0.009872, over 3027905.73 frames. ], batch size: 55, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:51:53,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1093380.0, ans=0.125 2023-11-20 12:52:11,269 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.534e+01 7.870e+01 8.763e+01 9.438e+01 1.322e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-20 12:52:11,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1093513.3333333333, ans=0.0 2023-11-20 12:52:15,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1093513.3333333333, ans=0.125 2023-11-20 12:52:18,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1093513.3333333333, ans=0.0 2023-11-20 12:52:18,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1093513.3333333333, ans=0.125 2023-11-20 12:52:39,901 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 164050 2023-11-20 12:52:51,353 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 7750, loss[loss=0.09732, simple_loss=0.1232, pruned_loss=0.02868, audio_tagging_loss=0.007061, over 15710.00 frames. ], tot_loss[loss=0.07848, simple_loss=0.09898, pruned_loss=0.01917, audio_tagging_loss=0.009813, over 3034527.87 frames. ], batch size: 58, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:53:08,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1093780.0, ans=0.1 2023-11-20 12:53:30,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1093913.3333333333, ans=0.0 2023-11-20 12:53:44,128 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 164100 2023-11-20 12:53:55,671 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 7800, loss[loss=0.09515, simple_loss=0.1237, pruned_loss=0.02382, audio_tagging_loss=0.009499, over 14847.00 frames. ], tot_loss[loss=0.07845, simple_loss=0.09916, pruned_loss=0.01901, audio_tagging_loss=0.00986, over 3036108.48 frames. ], batch size: 56, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:53:57,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1094046.6666666667, ans=0.2 2023-11-20 12:54:19,698 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.52 vs. limit=15.0 2023-11-20 12:54:20,240 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.543e+01 8.289e+01 9.083e+01 1.010e+02 1.614e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-20 12:54:29,979 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.77 vs. limit=22.5 2023-11-20 12:54:48,373 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 164150 2023-11-20 12:54:48,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1094313.3333333333, ans=0.125 2023-11-20 12:54:59,830 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 7850, loss[loss=0.06711, simple_loss=0.0868, pruned_loss=0.01154, audio_tagging_loss=0.01216, over 14427.00 frames. ], tot_loss[loss=0.07846, simple_loss=0.09925, pruned_loss=0.01896, audio_tagging_loss=0.00988, over 3041260.78 frames. ], batch size: 57, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:55:18,975 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=12.0 2023-11-20 12:55:21,562 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=12.0 2023-11-20 12:55:36,860 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.43 vs. limit=15.0 2023-11-20 12:55:40,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1094580.0, ans=0.125 2023-11-20 12:55:43,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1094580.0, ans=0.125 2023-11-20 12:55:53,656 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 164200 2023-11-20 12:56:05,332 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 7900, loss[loss=0.08141, simple_loss=0.09864, pruned_loss=0.02067, audio_tagging_loss=0.01142, over 14481.00 frames. ], tot_loss[loss=0.07829, simple_loss=0.09892, pruned_loss=0.01884, audio_tagging_loss=0.009992, over 3037060.61 frames. ], batch size: 54, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:56:28,983 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.813e+01 8.207e+01 9.087e+01 9.691e+01 1.187e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-20 12:56:58,039 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 164250 2023-11-20 12:57:09,142 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 7950, loss[loss=0.1003, simple_loss=0.1312, pruned_loss=0.02752, audio_tagging_loss=0.007134, over 15162.00 frames. ], tot_loss[loss=0.0793, simple_loss=0.09998, pruned_loss=0.01926, audio_tagging_loss=0.01006, over 3045126.06 frames. ], batch size: 55, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:57:09,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1095046.6666666667, ans=0.125 2023-11-20 12:57:26,260 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 12:58:02,373 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 164300 2023-11-20 12:58:02,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1095313.3333333333, ans=0.0 2023-11-20 12:58:07,798 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.06 vs. limit=22.5 2023-11-20 12:58:12,510 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:58:13,263 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 8000, loss[loss=0.0737, simple_loss=0.08046, pruned_loss=0.01903, audio_tagging_loss=0.01444, over 14961.00 frames. ], tot_loss[loss=0.07898, simple_loss=0.09911, pruned_loss=0.01922, audio_tagging_loss=0.01021, over 3045099.91 frames. ], batch size: 57, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:58:28,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1095446.6666666667, ans=0.125 2023-11-20 12:58:39,380 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.429e+01 8.050e+01 8.646e+01 9.454e+01 1.422e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-20 12:58:42,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1095513.3333333333, ans=0.2 2023-11-20 12:58:55,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-11-20 12:59:01,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1095580.0, ans=0.1 2023-11-20 12:59:06,800 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 164350 2023-11-20 12:59:08,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1095646.6666666667, ans=0.125 2023-11-20 12:59:17,718 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 8050, loss[loss=0.09859, simple_loss=0.1197, pruned_loss=0.02769, audio_tagging_loss=0.01103, over 15431.00 frames. ], tot_loss[loss=0.07873, simple_loss=0.09862, pruned_loss=0.01905, audio_tagging_loss=0.01037, over 3046469.08 frames. ], batch size: 58, lr: 4.90e-03, grad_scale: 16.0 2023-11-20 12:59:24,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1095713.3333333333, ans=0.125 2023-11-20 12:59:34,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1095780.0, ans=0.5 2023-11-20 12:59:58,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1095913.3333333333, ans=0.0 2023-11-20 13:00:05,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1095913.3333333333, ans=0.125 2023-11-20 13:00:08,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1095980.0, ans=0.035 2023-11-20 13:00:10,878 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 164400 2023-11-20 13:00:22,228 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 8100, loss[loss=0.06475, simple_loss=0.07685, pruned_loss=0.01359, audio_tagging_loss=0.01274, over 16398.00 frames. ], tot_loss[loss=0.07991, simple_loss=0.1005, pruned_loss=0.01949, audio_tagging_loss=0.01018, over 3048307.62 frames. ], batch size: 62, lr: 4.90e-03, grad_scale: 8.0 2023-11-20 13:00:50,968 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.681e+01 9.657e+01 1.040e+02 1.994e+02, threshold=1.931e+02, percent-clipped=2.0 2023-11-20 13:00:58,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.08 vs. limit=15.0 2023-11-20 13:01:04,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1096246.6666666667, ans=0.125 2023-11-20 13:01:14,999 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 164450 2023-11-20 13:01:26,648 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 8150, loss[loss=0.05739, simple_loss=0.07538, pruned_loss=0.009704, audio_tagging_loss=0.009999, over 14596.00 frames. ], tot_loss[loss=0.07972, simple_loss=0.1003, pruned_loss=0.01956, audio_tagging_loss=0.01, over 3042266.68 frames. ], batch size: 57, lr: 4.90e-03, grad_scale: 8.0 2023-11-20 13:01:49,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1096446.6666666667, ans=0.0 2023-11-20 13:01:52,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.40 vs. limit=6.0 2023-11-20 13:02:00,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1096513.3333333333, ans=0.0 2023-11-20 13:02:14,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=1096580.0, ans=12.0 2023-11-20 13:02:18,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1096646.6666666667, ans=0.125 2023-11-20 13:02:19,713 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 164500 2023-11-20 13:02:25,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1096646.6666666667, ans=0.0 2023-11-20 13:02:29,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1096646.6666666667, ans=0.125 2023-11-20 13:02:31,742 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 8200, loss[loss=0.07308, simple_loss=0.09472, pruned_loss=0.01728, audio_tagging_loss=0.008443, over 15682.00 frames. ], tot_loss[loss=0.07904, simple_loss=0.09957, pruned_loss=0.01938, audio_tagging_loss=0.009871, over 3040489.01 frames. ], batch size: 57, lr: 4.90e-03, grad_scale: 8.0 2023-11-20 13:02:33,042 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:02:35,650 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:02:37,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1096713.3333333333, ans=0.125 2023-11-20 13:02:47,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1096780.0, ans=0.1 2023-11-20 13:02:54,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1096780.0, ans=0.0 2023-11-20 13:03:00,015 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.486e+01 8.378e+01 8.960e+01 9.917e+01 5.109e+02, threshold=1.792e+02, percent-clipped=1.0 2023-11-20 13:03:01,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1096846.6666666667, ans=0.2 2023-11-20 13:03:21,745 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.69 vs. limit=22.5 2023-11-20 13:03:24,733 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 164550 2023-11-20 13:03:25,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1096980.0, ans=0.125 2023-11-20 13:03:36,291 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 8250, loss[loss=0.07479, simple_loss=0.09448, pruned_loss=0.0198, audio_tagging_loss=0.007753, over 14103.00 frames. ], tot_loss[loss=0.07971, simple_loss=0.1006, pruned_loss=0.01954, audio_tagging_loss=0.009859, over 3036535.01 frames. ], batch size: 52, lr: 4.90e-03, grad_scale: 8.0 2023-11-20 13:03:43,416 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.89 vs. limit=10.0 2023-11-20 13:03:49,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1097113.3333333333, ans=10.0 2023-11-20 13:03:57,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1097113.3333333333, ans=0.04949747468305833 2023-11-20 13:04:25,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1097246.6666666667, ans=0.0 2023-11-20 13:04:28,661 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 164600 2023-11-20 13:04:40,212 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 8300, loss[loss=0.06654, simple_loss=0.09001, pruned_loss=0.01292, audio_tagging_loss=0.008609, over 16628.00 frames. ], tot_loss[loss=0.07885, simple_loss=0.09928, pruned_loss=0.0192, audio_tagging_loss=0.01001, over 3038423.94 frames. ], batch size: 60, lr: 4.90e-03, grad_scale: 8.0 2023-11-20 13:04:45,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1097380.0, ans=0.2 2023-11-20 13:05:08,980 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.931e+01 8.247e+01 8.781e+01 9.736e+01 1.742e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 13:05:10,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1097513.3333333333, ans=0.125 2023-11-20 13:05:10,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1097513.3333333333, ans=0.125 2023-11-20 13:05:32,902 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 164650 2023-11-20 13:05:44,411 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 8350, loss[loss=0.07523, simple_loss=0.09848, pruned_loss=0.01624, audio_tagging_loss=0.009748, over 14215.00 frames. ], tot_loss[loss=0.07878, simple_loss=0.09948, pruned_loss=0.01918, audio_tagging_loss=0.009862, over 3039635.06 frames. ], batch size: 54, lr: 4.89e-03, grad_scale: 8.0 2023-11-20 13:05:47,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1097713.3333333333, ans=0.125 2023-11-20 13:06:11,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1097846.6666666667, ans=0.0 2023-11-20 13:06:25,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1097913.3333333333, ans=0.125 2023-11-20 13:06:29,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1097913.3333333333, ans=0.125 2023-11-20 13:06:30,995 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:06:35,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1097980.0, ans=0.125 2023-11-20 13:06:36,770 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 164700 2023-11-20 13:06:46,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1097980.0, ans=0.125 2023-11-20 13:06:49,003 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 8400, loss[loss=0.07895, simple_loss=0.1095, pruned_loss=0.01656, audio_tagging_loss=0.007647, over 14900.00 frames. ], tot_loss[loss=0.0784, simple_loss=0.09887, pruned_loss=0.01909, audio_tagging_loss=0.00987, over 3045292.10 frames. ], batch size: 55, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:07:07,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1098113.3333333333, ans=0.0 2023-11-20 13:07:16,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1098180.0, ans=0.0 2023-11-20 13:07:17,339 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.406e+01 7.789e+01 8.682e+01 9.296e+01 1.321e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-20 13:07:41,752 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 164750 2023-11-20 13:07:43,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1098313.3333333333, ans=0.2 2023-11-20 13:07:44,345 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:07:51,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1098313.3333333333, ans=0.125 2023-11-20 13:07:53,371 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 8450, loss[loss=0.07673, simple_loss=0.09789, pruned_loss=0.01899, audio_tagging_loss=0.008793, over 15773.00 frames. ], tot_loss[loss=0.07814, simple_loss=0.09841, pruned_loss=0.01901, audio_tagging_loss=0.009924, over 3045912.02 frames. ], batch size: 59, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:08:01,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1098380.0, ans=0.0 2023-11-20 13:08:10,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1098446.6666666667, ans=0.0 2023-11-20 13:08:15,718 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2023-11-20 13:08:17,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1098513.3333333333, ans=0.125 2023-11-20 13:08:23,975 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2023-11-20 13:08:24,265 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.99 vs. limit=22.5 2023-11-20 13:08:27,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1098513.3333333333, ans=0.1 2023-11-20 13:08:46,014 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 164800 2023-11-20 13:08:48,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1098646.6666666667, ans=0.1 2023-11-20 13:08:57,294 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.84 vs. limit=22.5 2023-11-20 13:08:57,721 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 8500, loss[loss=0.08934, simple_loss=0.1017, pruned_loss=0.03095, audio_tagging_loss=0.007568, over 14642.00 frames. ], tot_loss[loss=0.07836, simple_loss=0.09855, pruned_loss=0.01912, audio_tagging_loss=0.00996, over 3040064.18 frames. ], batch size: 56, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:09:06,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1098713.3333333333, ans=0.0 2023-11-20 13:09:16,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1098780.0, ans=0.0 2023-11-20 13:09:25,682 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.462e+01 8.485e+01 9.217e+01 9.923e+01 2.190e+02, threshold=1.843e+02, percent-clipped=1.0 2023-11-20 13:09:50,092 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 164850 2023-11-20 13:10:02,352 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 8550, loss[loss=0.07993, simple_loss=0.1029, pruned_loss=0.01749, audio_tagging_loss=0.01101, over 14110.00 frames. ], tot_loss[loss=0.07866, simple_loss=0.09896, pruned_loss=0.0192, audio_tagging_loss=0.009977, over 3040116.10 frames. ], batch size: 53, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:10:06,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1099046.6666666667, ans=0.2 2023-11-20 13:10:06,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1099046.6666666667, ans=0.1 2023-11-20 13:10:26,621 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2023-11-20 13:10:30,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1099180.0, ans=0.1 2023-11-20 13:10:55,023 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 164900 2023-11-20 13:10:55,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1099313.3333333333, ans=0.1 2023-11-20 13:11:06,510 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 8600, loss[loss=0.06613, simple_loss=0.08672, pruned_loss=0.01314, audio_tagging_loss=0.009626, over 15151.00 frames. ], tot_loss[loss=0.07938, simple_loss=0.1004, pruned_loss=0.0193, audio_tagging_loss=0.009884, over 3045119.66 frames. ], batch size: 57, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:11:11,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.40 vs. limit=22.5 2023-11-20 13:11:18,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1099446.6666666667, ans=0.125 2023-11-20 13:11:28,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1099446.6666666667, ans=0.125 2023-11-20 13:11:31,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1099513.3333333333, ans=0.07 2023-11-20 13:11:33,764 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.330e+01 8.002e+01 8.794e+01 9.640e+01 1.342e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 13:11:47,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1099580.0, ans=0.2 2023-11-20 13:11:48,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1099580.0, ans=0.04949747468305833 2023-11-20 13:11:50,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1099580.0, ans=0.2 2023-11-20 13:11:54,866 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.91 vs. limit=15.0 2023-11-20 13:11:58,960 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 164950 2023-11-20 13:12:09,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1099713.3333333333, ans=0.1 2023-11-20 13:12:10,631 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 8650, loss[loss=0.08518, simple_loss=0.08932, pruned_loss=0.0269, audio_tagging_loss=0.01362, over 15128.00 frames. ], tot_loss[loss=0.07992, simple_loss=0.1011, pruned_loss=0.01949, audio_tagging_loss=0.009901, over 3053343.75 frames. ], batch size: 58, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:12:13,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1099713.3333333333, ans=0.07 2023-11-20 13:12:15,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1099713.3333333333, ans=0.1 2023-11-20 13:12:20,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1099713.3333333333, ans=0.0 2023-11-20 13:12:26,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1099780.0, ans=0.05 2023-11-20 13:12:36,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1099846.6666666667, ans=0.125 2023-11-20 13:12:49,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1099913.3333333333, ans=0.0 2023-11-20 13:12:57,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1099913.3333333333, ans=0.2 2023-11-20 13:13:03,796 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 165000 2023-11-20 13:13:08,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1099980.0, ans=0.125 2023-11-20 13:13:15,571 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 8700, loss[loss=0.09102, simple_loss=0.1214, pruned_loss=0.02453, audio_tagging_loss=0.005767, over 14647.00 frames. ], tot_loss[loss=0.08011, simple_loss=0.101, pruned_loss=0.01957, audio_tagging_loss=0.01004, over 3059974.14 frames. ], batch size: 55, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:13:15,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1100046.6666666667, ans=0.0 2023-11-20 13:13:20,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1100046.6666666667, ans=0.2 2023-11-20 13:13:23,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1100046.6666666667, ans=0.125 2023-11-20 13:13:44,002 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.858e+01 8.298e+01 8.856e+01 9.572e+01 1.265e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-20 13:13:51,136 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2023-11-20 13:14:05,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1100246.6666666667, ans=0.0 2023-11-20 13:14:07,736 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2023-11-20 13:14:08,412 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 165050 2023-11-20 13:14:12,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1100313.3333333333, ans=0.2 2023-11-20 13:14:20,655 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 8750, loss[loss=0.09999, simple_loss=0.1325, pruned_loss=0.02666, audio_tagging_loss=0.007095, over 14809.00 frames. ], tot_loss[loss=0.08052, simple_loss=0.1015, pruned_loss=0.01971, audio_tagging_loss=0.01007, over 3055203.99 frames. ], batch size: 54, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:15:03,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1100580.0, ans=0.125 2023-11-20 13:15:05,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1100580.0, ans=0.0 2023-11-20 13:15:13,125 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 165100 2023-11-20 13:15:19,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1100646.6666666667, ans=0.125 2023-11-20 13:15:21,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1100646.6666666667, ans=0.125 2023-11-20 13:15:21,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1100646.6666666667, ans=0.125 2023-11-20 13:15:22,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1100646.6666666667, ans=0.0 2023-11-20 13:15:23,986 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 8800, loss[loss=0.08444, simple_loss=0.121, pruned_loss=0.01677, audio_tagging_loss=0.007161, over 15974.00 frames. ], tot_loss[loss=0.0809, simple_loss=0.1019, pruned_loss=0.0199, audio_tagging_loss=0.01006, over 3051058.59 frames. ], batch size: 57, lr: 4.89e-03, grad_scale: 32.0 2023-11-20 13:15:41,051 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=22.5 2023-11-20 13:15:42,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1100780.0, ans=0.1 2023-11-20 13:15:46,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1100780.0, ans=0.125 2023-11-20 13:15:51,942 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.572e+01 8.177e+01 8.747e+01 9.582e+01 1.210e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-20 13:16:02,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1100913.3333333333, ans=0.125 2023-11-20 13:16:07,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1100913.3333333333, ans=0.1 2023-11-20 13:16:16,733 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 165150 2023-11-20 13:16:19,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1100980.0, ans=0.09899494936611666 2023-11-20 13:16:20,940 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2023-11-20 13:16:26,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1101046.6666666667, ans=0.2 2023-11-20 13:16:27,593 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 8850, loss[loss=0.08559, simple_loss=0.1129, pruned_loss=0.01971, audio_tagging_loss=0.009442, over 15915.00 frames. ], tot_loss[loss=0.08123, simple_loss=0.1025, pruned_loss=0.01993, audio_tagging_loss=0.01007, over 3048107.93 frames. ], batch size: 60, lr: 4.89e-03, grad_scale: 32.0 2023-11-20 13:16:40,499 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:16:43,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1101113.3333333333, ans=0.025 2023-11-20 13:17:00,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1101180.0, ans=0.2 2023-11-20 13:17:01,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1101180.0, ans=0.125 2023-11-20 13:17:21,016 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 165200 2023-11-20 13:17:21,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1101313.3333333333, ans=0.0 2023-11-20 13:17:29,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1101313.3333333333, ans=0.1 2023-11-20 13:17:32,329 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 8900, loss[loss=0.07356, simple_loss=0.09674, pruned_loss=0.01669, audio_tagging_loss=0.008502, over 14918.00 frames. ], tot_loss[loss=0.08073, simple_loss=0.1021, pruned_loss=0.01984, audio_tagging_loss=0.009829, over 3052148.18 frames. ], batch size: 57, lr: 4.89e-03, grad_scale: 32.0 2023-11-20 13:17:45,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1101446.6666666667, ans=0.1 2023-11-20 13:17:47,659 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2023-11-20 13:17:56,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1101446.6666666667, ans=0.04949747468305833 2023-11-20 13:18:00,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1101513.3333333333, ans=0.0 2023-11-20 13:18:00,916 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.758e+01 8.398e+01 8.901e+01 9.799e+01 1.599e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-20 13:18:21,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1101580.0, ans=0.1 2023-11-20 13:18:23,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1101646.6666666667, ans=0.2 2023-11-20 13:18:25,652 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 165250 2023-11-20 13:18:28,048 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.53 vs. limit=22.5 2023-11-20 13:18:37,237 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 8950, loss[loss=0.0914, simple_loss=0.09979, pruned_loss=0.02722, audio_tagging_loss=0.01429, over 14927.00 frames. ], tot_loss[loss=0.07977, simple_loss=0.1008, pruned_loss=0.0195, audio_tagging_loss=0.009857, over 3051665.32 frames. ], batch size: 56, lr: 4.89e-03, grad_scale: 32.0 2023-11-20 13:18:46,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1101713.3333333333, ans=0.125 2023-11-20 13:19:13,845 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.42 vs. limit=15.0 2023-11-20 13:19:14,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1101913.3333333333, ans=0.125 2023-11-20 13:19:15,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1101913.3333333333, ans=0.0 2023-11-20 13:19:29,851 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 165300 2023-11-20 13:19:33,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1101980.0, ans=0.2 2023-11-20 13:19:35,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1101980.0, ans=0.1 2023-11-20 13:19:41,538 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 9000, loss[loss=0.07416, simple_loss=0.08705, pruned_loss=0.02009, audio_tagging_loss=0.01055, over 15379.00 frames. ], tot_loss[loss=0.07975, simple_loss=0.1007, pruned_loss=0.01961, audio_tagging_loss=0.009802, over 3050984.23 frames. ], batch size: 58, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:19:41,539 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-20 13:20:17,532 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.7750, 4.1599, 3.7144, 3.1921], device='cuda:3') 2023-11-20 13:20:21,567 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4197, 3.0599, 3.5830, 3.3619], device='cuda:3') 2023-11-20 13:20:23,241 INFO [train_asr.py:1294] (3/4) Epoch 14, validation: loss=0.06237, simple_loss=0.05346, pruned_loss=0.005661, audio_tagging_loss=0.02998, over 4681554.00 frames. 2023-11-20 13:20:23,241 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-20 13:20:27,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1102046.6666666667, ans=0.0 2023-11-20 13:20:30,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1102046.6666666667, ans=0.125 2023-11-20 13:20:32,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1102046.6666666667, ans=0.125 2023-11-20 13:20:50,973 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.532e+01 8.335e+01 8.996e+01 9.763e+01 1.376e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-20 13:21:03,308 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.68 vs. limit=22.5 2023-11-20 13:21:05,599 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.81 vs. limit=15.0 2023-11-20 13:21:16,388 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 165350 2023-11-20 13:21:27,306 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 9050, loss[loss=0.06672, simple_loss=0.08691, pruned_loss=0.01427, audio_tagging_loss=0.008988, over 15460.00 frames. ], tot_loss[loss=0.07977, simple_loss=0.1007, pruned_loss=0.01965, audio_tagging_loss=0.00979, over 3050481.22 frames. ], batch size: 59, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:21:49,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1102446.6666666667, ans=0.125 2023-11-20 13:22:11,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1102580.0, ans=0.125 2023-11-20 13:22:20,851 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 165400 2023-11-20 13:22:30,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1102646.6666666667, ans=0.1 2023-11-20 13:22:32,188 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 9100, loss[loss=0.0825, simple_loss=0.1059, pruned_loss=0.0206, audio_tagging_loss=0.008933, over 15747.00 frames. ], tot_loss[loss=0.07903, simple_loss=0.09985, pruned_loss=0.01934, audio_tagging_loss=0.009756, over 3054132.64 frames. ], batch size: 56, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:22:44,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1102780.0, ans=0.0 2023-11-20 13:22:45,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1102780.0, ans=0.0 2023-11-20 13:22:55,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1102780.0, ans=0.5 2023-11-20 13:23:01,054 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 8.047e+01 8.709e+01 9.562e+01 1.643e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 13:23:25,753 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 165450 2023-11-20 13:23:27,553 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2023-11-20 13:23:31,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1102980.0, ans=0.0 2023-11-20 13:23:35,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1103046.6666666667, ans=0.125 2023-11-20 13:23:36,768 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 9150, loss[loss=0.07644, simple_loss=0.101, pruned_loss=0.01759, audio_tagging_loss=0.008344, over 15138.00 frames. ], tot_loss[loss=0.07874, simple_loss=0.09944, pruned_loss=0.01935, audio_tagging_loss=0.009677, over 3048439.56 frames. ], batch size: 57, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:23:42,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1103046.6666666667, ans=0.0 2023-11-20 13:23:58,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1103113.3333333333, ans=0.05 2023-11-20 13:24:08,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1103180.0, ans=0.125 2023-11-20 13:24:14,259 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.48 vs. limit=15.0 2023-11-20 13:24:18,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1103246.6666666667, ans=0.0 2023-11-20 13:24:30,185 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 165500 2023-11-20 13:24:38,130 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.13 vs. limit=22.5 2023-11-20 13:24:41,867 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 9200, loss[loss=0.07077, simple_loss=0.09251, pruned_loss=0.01397, audio_tagging_loss=0.01055, over 14844.00 frames. ], tot_loss[loss=0.07948, simple_loss=0.1003, pruned_loss=0.01963, audio_tagging_loss=0.009694, over 3056391.88 frames. ], batch size: 57, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:24:53,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1103446.6666666667, ans=0.2 2023-11-20 13:24:53,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1103446.6666666667, ans=0.07 2023-11-20 13:25:10,021 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.657e+01 8.175e+01 8.950e+01 9.913e+01 1.287e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-20 13:25:26,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1103580.0, ans=0.2 2023-11-20 13:25:34,567 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 165550 2023-11-20 13:25:34,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1103646.6666666667, ans=0.2 2023-11-20 13:25:40,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1103646.6666666667, ans=0.07 2023-11-20 13:25:46,451 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 9250, loss[loss=0.07068, simple_loss=0.09591, pruned_loss=0.0154, audio_tagging_loss=0.00732, over 14652.00 frames. ], tot_loss[loss=0.07887, simple_loss=0.0995, pruned_loss=0.0194, audio_tagging_loss=0.009717, over 3054607.37 frames. ], batch size: 56, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:25:52,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1103713.3333333333, ans=0.1 2023-11-20 13:25:53,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1103713.3333333333, ans=0.04949747468305833 2023-11-20 13:26:08,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1103780.0, ans=0.04949747468305833 2023-11-20 13:26:13,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1103846.6666666667, ans=0.2 2023-11-20 13:26:16,324 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.37 vs. limit=22.5 2023-11-20 13:26:18,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1103846.6666666667, ans=0.125 2023-11-20 13:26:39,639 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 165600 2023-11-20 13:26:51,010 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 9300, loss[loss=0.07951, simple_loss=0.1061, pruned_loss=0.01775, audio_tagging_loss=0.008723, over 15283.00 frames. ], tot_loss[loss=0.0798, simple_loss=0.1011, pruned_loss=0.0196, audio_tagging_loss=0.009655, over 3056683.80 frames. ], batch size: 55, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:26:57,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1104046.6666666667, ans=0.0 2023-11-20 13:27:21,187 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.442e+01 7.830e+01 8.462e+01 9.599e+01 1.223e+02, threshold=1.692e+02, percent-clipped=0.0 2023-11-20 13:27:22,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1104180.0, ans=0.125 2023-11-20 13:27:28,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1104246.6666666667, ans=0.125 2023-11-20 13:27:37,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1104246.6666666667, ans=0.07 2023-11-20 13:27:44,403 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 165650 2023-11-20 13:27:55,238 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 9350, loss[loss=0.0867, simple_loss=0.1181, pruned_loss=0.01847, audio_tagging_loss=0.009199, over 16324.00 frames. ], tot_loss[loss=0.07969, simple_loss=0.101, pruned_loss=0.01946, audio_tagging_loss=0.009747, over 3058928.49 frames. ], batch size: 59, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:27:59,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1104380.0, ans=0.125 2023-11-20 13:28:19,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1104446.6666666667, ans=0.07 2023-11-20 13:28:20,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1104513.3333333333, ans=0.07 2023-11-20 13:28:28,039 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.62 vs. limit=15.0 2023-11-20 13:28:31,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=1104513.3333333333, ans=12.0 2023-11-20 13:28:36,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1104580.0, ans=0.125 2023-11-20 13:28:47,693 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 165700 2023-11-20 13:28:59,959 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 9400, loss[loss=0.08422, simple_loss=0.09569, pruned_loss=0.02334, audio_tagging_loss=0.01304, over 16529.00 frames. ], tot_loss[loss=0.07907, simple_loss=0.09984, pruned_loss=0.01923, audio_tagging_loss=0.009922, over 3051359.22 frames. ], batch size: 62, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:29:06,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1104713.3333333333, ans=0.1 2023-11-20 13:29:13,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1104780.0, ans=0.0 2023-11-20 13:29:29,614 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.579e+01 8.038e+01 8.701e+01 9.410e+01 1.188e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-20 13:29:37,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1104913.3333333333, ans=0.125 2023-11-20 13:29:40,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1104913.3333333333, ans=0.125 2023-11-20 13:29:41,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1104913.3333333333, ans=0.0 2023-11-20 13:29:52,930 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 165750 2023-11-20 13:29:54,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1104980.0, ans=0.125 2023-11-20 13:29:59,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1104980.0, ans=0.125 2023-11-20 13:30:02,336 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:30:04,856 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 9450, loss[loss=0.08347, simple_loss=0.1133, pruned_loss=0.01605, audio_tagging_loss=0.01078, over 15068.00 frames. ], tot_loss[loss=0.07935, simple_loss=0.09988, pruned_loss=0.01935, audio_tagging_loss=0.01006, over 3051147.53 frames. ], batch size: 54, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:30:07,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1105046.6666666667, ans=0.0 2023-11-20 13:30:10,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1105046.6666666667, ans=0.1 2023-11-20 13:30:10,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1105046.6666666667, ans=0.0 2023-11-20 13:30:25,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1105113.3333333333, ans=0.125 2023-11-20 13:30:25,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1105113.3333333333, ans=0.125 2023-11-20 13:30:35,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.62 vs. limit=15.0 2023-11-20 13:30:57,410 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 165800 2023-11-20 13:31:02,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1105313.3333333333, ans=0.125 2023-11-20 13:31:05,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1105313.3333333333, ans=15.0 2023-11-20 13:31:09,007 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 9500, loss[loss=0.08666, simple_loss=0.1129, pruned_loss=0.02093, audio_tagging_loss=0.009272, over 15256.00 frames. ], tot_loss[loss=0.07944, simple_loss=0.1001, pruned_loss=0.0193, audio_tagging_loss=0.0101, over 3048912.60 frames. ], batch size: 56, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:31:10,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1105380.0, ans=0.125 2023-11-20 13:31:39,027 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.167e+01 8.714e+01 9.690e+01 1.183e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-20 13:32:01,604 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 165850 2023-11-20 13:32:07,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1105646.6666666667, ans=0.0 2023-11-20 13:32:13,248 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 9550, loss[loss=0.08285, simple_loss=0.1025, pruned_loss=0.02209, audio_tagging_loss=0.009503, over 14473.00 frames. ], tot_loss[loss=0.08054, simple_loss=0.1018, pruned_loss=0.01961, audio_tagging_loss=0.01004, over 3042083.20 frames. ], batch size: 55, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:32:18,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1105713.3333333333, ans=0.1 2023-11-20 13:32:18,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1105713.3333333333, ans=0.2 2023-11-20 13:32:28,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1105780.0, ans=0.0 2023-11-20 13:32:41,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1105846.6666666667, ans=0.125 2023-11-20 13:32:43,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1105846.6666666667, ans=0.125 2023-11-20 13:33:06,233 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 165900 2023-11-20 13:33:13,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1105980.0, ans=0.0 2023-11-20 13:33:16,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1106046.6666666667, ans=0.95 2023-11-20 13:33:17,801 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 9600, loss[loss=0.07418, simple_loss=0.09554, pruned_loss=0.01603, audio_tagging_loss=0.01038, over 14182.00 frames. ], tot_loss[loss=0.08016, simple_loss=0.1013, pruned_loss=0.01938, audio_tagging_loss=0.01014, over 3046268.10 frames. ], batch size: 54, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:33:24,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1106046.6666666667, ans=0.125 2023-11-20 13:33:40,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1106113.3333333333, ans=0.05 2023-11-20 13:33:44,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1106180.0, ans=0.0 2023-11-20 13:33:48,242 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.832e+01 8.385e+01 9.134e+01 1.022e+02 1.365e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-20 13:33:52,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1106180.0, ans=0.125 2023-11-20 13:34:03,361 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.55 vs. limit=15.0 2023-11-20 13:34:10,059 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 165950 2023-11-20 13:34:12,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1106313.3333333333, ans=0.125 2023-11-20 13:34:16,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1106313.3333333333, ans=0.1 2023-11-20 13:34:21,514 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 9650, loss[loss=0.08331, simple_loss=0.1018, pruned_loss=0.0225, audio_tagging_loss=0.009907, over 15913.00 frames. ], tot_loss[loss=0.07943, simple_loss=0.1003, pruned_loss=0.01917, audio_tagging_loss=0.01011, over 3038676.78 frames. ], batch size: 59, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:34:31,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1106380.0, ans=0.125 2023-11-20 13:34:51,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1106513.3333333333, ans=0.0 2023-11-20 13:35:05,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1106580.0, ans=0.0 2023-11-20 13:35:05,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1106580.0, ans=0.0 2023-11-20 13:35:14,148 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 166000 2023-11-20 13:35:25,847 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 9700, loss[loss=0.0884, simple_loss=0.1159, pruned_loss=0.02256, audio_tagging_loss=0.007863, over 15117.00 frames. ], tot_loss[loss=0.07967, simple_loss=0.1011, pruned_loss=0.01926, audio_tagging_loss=0.009854, over 3041034.21 frames. ], batch size: 57, lr: 4.87e-03, grad_scale: 16.0 2023-11-20 13:35:31,464 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=15.0 2023-11-20 13:35:32,570 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=12.0 2023-11-20 13:35:35,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1106713.3333333333, ans=0.2 2023-11-20 13:35:56,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1106846.6666666667, ans=0.0 2023-11-20 13:35:57,239 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 8.103e+01 9.034e+01 9.824e+01 1.276e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-20 13:36:18,843 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 166050 2023-11-20 13:36:31,009 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 9750, loss[loss=0.08208, simple_loss=0.1155, pruned_loss=0.01603, audio_tagging_loss=0.008279, over 15790.00 frames. ], tot_loss[loss=0.07991, simple_loss=0.1017, pruned_loss=0.01937, audio_tagging_loss=0.009691, over 3042368.42 frames. ], batch size: 57, lr: 4.87e-03, grad_scale: 16.0 2023-11-20 13:36:36,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1107046.6666666667, ans=0.0 2023-11-20 13:36:48,305 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.30 vs. limit=22.5 2023-11-20 13:36:55,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1107180.0, ans=0.0 2023-11-20 13:37:03,773 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2023-11-20 13:37:23,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1107313.3333333333, ans=0.125 2023-11-20 13:37:24,255 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 166100 2023-11-20 13:37:35,934 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 9800, loss[loss=0.1326, simple_loss=0.1694, pruned_loss=0.04204, audio_tagging_loss=0.005891, over 16564.00 frames. ], tot_loss[loss=0.08097, simple_loss=0.1028, pruned_loss=0.01985, audio_tagging_loss=0.009691, over 3045552.57 frames. ], batch size: 60, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:37:37,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1107380.0, ans=0.125 2023-11-20 13:37:51,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1107446.6666666667, ans=0.125 2023-11-20 13:37:57,465 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:38:07,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1107513.3333333333, ans=0.0 2023-11-20 13:38:07,942 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.608e+01 8.297e+01 9.086e+01 9.730e+01 1.369e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-20 13:38:11,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1107513.3333333333, ans=0.125 2023-11-20 13:38:27,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1107646.6666666667, ans=0.2 2023-11-20 13:38:28,901 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 166150 2023-11-20 13:38:32,574 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:38:40,592 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 9850, loss[loss=0.1, simple_loss=0.1357, pruned_loss=0.02437, audio_tagging_loss=0.007769, over 14570.00 frames. ], tot_loss[loss=0.08141, simple_loss=0.1034, pruned_loss=0.02013, audio_tagging_loss=0.009583, over 3046147.68 frames. ], batch size: 53, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:38:51,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1107780.0, ans=0.1 2023-11-20 13:38:58,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1107780.0, ans=0.125 2023-11-20 13:39:10,703 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.75 vs. limit=22.5 2023-11-20 13:39:12,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1107846.6666666667, ans=0.125 2023-11-20 13:39:14,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1107846.6666666667, ans=0.125 2023-11-20 13:39:16,464 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2023-11-20 13:39:17,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1107846.6666666667, ans=0.125 2023-11-20 13:39:20,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1107913.3333333333, ans=0.0 2023-11-20 13:39:24,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1107913.3333333333, ans=0.2 2023-11-20 13:39:33,704 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 166200 2023-11-20 13:39:33,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1107980.0, ans=0.015 2023-11-20 13:39:45,772 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 9900, loss[loss=0.07219, simple_loss=0.08912, pruned_loss=0.01731, audio_tagging_loss=0.01031, over 14740.00 frames. ], tot_loss[loss=0.08097, simple_loss=0.1025, pruned_loss=0.01996, audio_tagging_loss=0.009769, over 3044721.41 frames. ], batch size: 56, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:39:58,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1108113.3333333333, ans=0.0 2023-11-20 13:40:03,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1108113.3333333333, ans=0.125 2023-11-20 13:40:18,348 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.742e+01 8.087e+01 8.695e+01 9.650e+01 1.416e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 13:40:22,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1108180.0, ans=0.2 2023-11-20 13:40:38,805 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 166250 2023-11-20 13:40:51,321 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 9950, loss[loss=0.09415, simple_loss=0.1248, pruned_loss=0.02324, audio_tagging_loss=0.008493, over 14932.00 frames. ], tot_loss[loss=0.08037, simple_loss=0.1017, pruned_loss=0.01978, audio_tagging_loss=0.009758, over 3045486.51 frames. ], batch size: 55, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:40:57,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1108380.0, ans=0.125 2023-11-20 13:40:59,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1108380.0, ans=0.0 2023-11-20 13:41:35,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1108580.0, ans=0.125 2023-11-20 13:41:37,856 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2023-11-20 13:41:43,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1108646.6666666667, ans=0.125 2023-11-20 13:41:44,162 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 166300 2023-11-20 13:41:46,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1108646.6666666667, ans=0.125 2023-11-20 13:41:54,995 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 10000, loss[loss=0.1114, simple_loss=0.1505, pruned_loss=0.02745, audio_tagging_loss=0.008697, over 14888.00 frames. ], tot_loss[loss=0.0797, simple_loss=0.1007, pruned_loss=0.01948, audio_tagging_loss=0.009854, over 3047951.97 frames. ], batch size: 56, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:42:25,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=15.0 2023-11-20 13:42:29,452 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.616e+01 8.105e+01 8.776e+01 9.451e+01 1.209e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-20 13:42:44,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1108913.3333333333, ans=0.0 2023-11-20 13:42:48,537 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 166350 2023-11-20 13:42:59,234 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 10050, loss[loss=0.1044, simple_loss=0.1321, pruned_loss=0.02942, audio_tagging_loss=0.008994, over 15852.00 frames. ], tot_loss[loss=0.07998, simple_loss=0.1011, pruned_loss=0.01954, audio_tagging_loss=0.00989, over 3047246.50 frames. ], batch size: 58, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:43:09,063 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.82 vs. limit=15.0 2023-11-20 13:43:13,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1109113.3333333333, ans=6.0 2023-11-20 13:43:41,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.62 vs. limit=22.5 2023-11-20 13:43:52,158 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 166400 2023-11-20 13:44:03,881 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 10100, loss[loss=0.05632, simple_loss=0.06478, pruned_loss=0.01079, audio_tagging_loss=0.01314, over 15421.00 frames. ], tot_loss[loss=0.07952, simple_loss=0.1003, pruned_loss=0.01934, audio_tagging_loss=0.01002, over 3043804.56 frames. ], batch size: 58, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:44:31,911 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.71 vs. limit=15.0 2023-11-20 13:44:37,153 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.486e+01 8.086e+01 8.697e+01 9.512e+01 1.226e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 13:44:41,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1109580.0, ans=0.125 2023-11-20 13:44:46,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1109580.0, ans=0.125 2023-11-20 13:44:47,025 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.27 vs. limit=10.0 2023-11-20 13:44:49,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1109580.0, ans=0.0 2023-11-20 13:44:56,184 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:44:57,542 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 166450 2023-11-20 13:45:06,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1109646.6666666667, ans=0.0 2023-11-20 13:45:06,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1109646.6666666667, ans=0.2 2023-11-20 13:45:08,325 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 10150, loss[loss=0.08888, simple_loss=0.1164, pruned_loss=0.02391, audio_tagging_loss=0.006781, over 16348.00 frames. ], tot_loss[loss=0.07993, simple_loss=0.1011, pruned_loss=0.01931, audio_tagging_loss=0.01006, over 3041950.00 frames. ], batch size: 60, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:45:11,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1109713.3333333333, ans=0.125 2023-11-20 13:45:20,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=1109780.0, ans=10.0 2023-11-20 13:45:26,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1109780.0, ans=0.1 2023-11-20 13:45:31,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1109780.0, ans=0.0 2023-11-20 13:45:37,503 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:45:37,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1109846.6666666667, ans=0.09899494936611666 2023-11-20 13:46:00,874 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 166500 2023-11-20 13:46:01,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1109980.0, ans=0.125 2023-11-20 13:46:06,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=1109980.0, ans=0.02 2023-11-20 13:46:12,418 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 10200, loss[loss=0.08147, simple_loss=0.1027, pruned_loss=0.02086, audio_tagging_loss=0.009269, over 14103.00 frames. ], tot_loss[loss=0.07988, simple_loss=0.1012, pruned_loss=0.01923, audio_tagging_loss=0.01006, over 3044268.05 frames. ], batch size: 56, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:46:24,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1110113.3333333333, ans=0.1 2023-11-20 13:46:25,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1110113.3333333333, ans=0.125 2023-11-20 13:46:30,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1110113.3333333333, ans=0.0 2023-11-20 13:46:36,380 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:46:40,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1110180.0, ans=0.1 2023-11-20 13:46:46,105 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.930e+01 8.131e+01 8.850e+01 9.665e+01 1.277e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-20 13:47:05,113 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 166550 2023-11-20 13:47:16,468 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 10250, loss[loss=0.0554, simple_loss=0.06427, pruned_loss=0.01263, audio_tagging_loss=0.01064, over 15391.00 frames. ], tot_loss[loss=0.0792, simple_loss=0.1001, pruned_loss=0.01904, audio_tagging_loss=0.01013, over 3040533.83 frames. ], batch size: 60, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:47:17,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1110380.0, ans=0.0 2023-11-20 13:47:56,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1110580.0, ans=0.0 2023-11-20 13:48:09,982 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 166600 2023-11-20 13:48:17,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1110646.6666666667, ans=0.125 2023-11-20 13:48:20,504 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.07 vs. limit=15.0 2023-11-20 13:48:21,981 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 10300, loss[loss=0.08672, simple_loss=0.1017, pruned_loss=0.02196, audio_tagging_loss=0.01388, over 14634.00 frames. ], tot_loss[loss=0.07924, simple_loss=0.09992, pruned_loss=0.01916, audio_tagging_loss=0.01012, over 3032471.90 frames. ], batch size: 57, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:48:31,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1110713.3333333333, ans=0.1 2023-11-20 13:48:40,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1110780.0, ans=0.0 2023-11-20 13:48:51,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1110846.6666666667, ans=0.125 2023-11-20 13:48:55,306 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.311e+01 8.084e+01 8.693e+01 9.702e+01 1.335e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 13:48:55,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1110846.6666666667, ans=0.1 2023-11-20 13:49:12,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1110980.0, ans=0.125 2023-11-20 13:49:15,222 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 166650 2023-11-20 13:49:26,760 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 10350, loss[loss=0.09482, simple_loss=0.1182, pruned_loss=0.024, audio_tagging_loss=0.0117, over 15678.00 frames. ], tot_loss[loss=0.07967, simple_loss=0.1007, pruned_loss=0.01918, audio_tagging_loss=0.01012, over 3038019.52 frames. ], batch size: 55, lr: 4.86e-03, grad_scale: 8.0 2023-11-20 13:49:27,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.48 vs. limit=15.0 2023-11-20 13:49:29,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1111046.6666666667, ans=0.2 2023-11-20 13:49:41,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1111113.3333333333, ans=0.125 2023-11-20 13:49:47,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1111113.3333333333, ans=0.125 2023-11-20 13:49:51,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1111180.0, ans=0.0 2023-11-20 13:50:09,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1111246.6666666667, ans=0.0 2023-11-20 13:50:14,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1111246.6666666667, ans=0.1 2023-11-20 13:50:19,496 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 166700 2023-11-20 13:50:29,359 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.42 vs. limit=12.0 2023-11-20 13:50:31,143 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 10400, loss[loss=0.08219, simple_loss=0.09993, pruned_loss=0.02201, audio_tagging_loss=0.01021, over 15157.00 frames. ], tot_loss[loss=0.07977, simple_loss=0.1008, pruned_loss=0.01913, audio_tagging_loss=0.01024, over 3038454.45 frames. ], batch size: 56, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:50:31,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1111380.0, ans=0.05 2023-11-20 13:50:35,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1111380.0, ans=0.125 2023-11-20 13:50:41,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1111380.0, ans=0.0 2023-11-20 13:50:47,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1111446.6666666667, ans=0.2 2023-11-20 13:50:48,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1111446.6666666667, ans=0.09899494936611666 2023-11-20 13:50:48,983 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-11-20 13:51:05,057 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.857e+01 8.019e+01 8.655e+01 9.452e+01 1.304e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-20 13:51:05,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1111513.3333333333, ans=0.125 2023-11-20 13:51:06,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1111513.3333333333, ans=0.125 2023-11-20 13:51:07,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2023-11-20 13:51:24,464 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 166750 2023-11-20 13:51:33,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=1111646.6666666667, ans=0.02 2023-11-20 13:51:36,021 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 10450, loss[loss=0.07144, simple_loss=0.1013, pruned_loss=0.01498, audio_tagging_loss=0.005824, over 15539.00 frames. ], tot_loss[loss=0.07914, simple_loss=0.09996, pruned_loss=0.01898, audio_tagging_loss=0.01019, over 3035765.56 frames. ], batch size: 59, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:51:44,886 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.16 vs. limit=15.0 2023-11-20 13:51:57,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.16 vs. limit=15.0 2023-11-20 13:52:28,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1111980.0, ans=0.1 2023-11-20 13:52:29,656 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 166800 2023-11-20 13:52:30,282 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.59 vs. limit=6.0 2023-11-20 13:52:36,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1111980.0, ans=0.1 2023-11-20 13:52:39,594 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.42 vs. limit=22.5 2023-11-20 13:52:41,530 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 10500, loss[loss=0.06667, simple_loss=0.08456, pruned_loss=0.0161, audio_tagging_loss=0.008295, over 15369.00 frames. ], tot_loss[loss=0.07933, simple_loss=0.1004, pruned_loss=0.01908, audio_tagging_loss=0.01005, over 3040093.53 frames. ], batch size: 58, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:52:49,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1112046.6666666667, ans=0.0 2023-11-20 13:53:04,449 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:53:05,128 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.34 vs. limit=15.0 2023-11-20 13:53:14,862 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 8.112e+01 8.724e+01 9.287e+01 1.188e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-20 13:53:27,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1112246.6666666667, ans=0.125 2023-11-20 13:53:34,597 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 166850 2023-11-20 13:53:45,950 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 10550, loss[loss=0.06761, simple_loss=0.08252, pruned_loss=0.01622, audio_tagging_loss=0.01013, over 15186.00 frames. ], tot_loss[loss=0.07921, simple_loss=0.1005, pruned_loss=0.01903, audio_tagging_loss=0.009934, over 3041264.63 frames. ], batch size: 60, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:53:46,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1112380.0, ans=0.125 2023-11-20 13:53:57,072 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.81 vs. limit=10.0 2023-11-20 13:54:05,905 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2023-11-20 13:54:24,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1112580.0, ans=0.05 2023-11-20 13:54:31,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1112580.0, ans=0.125 2023-11-20 13:54:32,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1112580.0, ans=0.035 2023-11-20 13:54:36,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1112646.6666666667, ans=0.0 2023-11-20 13:54:38,998 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 166900 2023-11-20 13:54:41,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1112646.6666666667, ans=0.125 2023-11-20 13:54:50,578 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 10600, loss[loss=0.07318, simple_loss=0.1015, pruned_loss=0.01249, audio_tagging_loss=0.009958, over 15521.00 frames. ], tot_loss[loss=0.07903, simple_loss=0.1003, pruned_loss=0.01904, audio_tagging_loss=0.009858, over 3038387.80 frames. ], batch size: 56, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:54:50,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1112713.3333333333, ans=0.125 2023-11-20 13:55:02,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1112780.0, ans=0.0 2023-11-20 13:55:05,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1112780.0, ans=0.0 2023-11-20 13:55:14,295 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.32 vs. limit=15.0 2023-11-20 13:55:24,051 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.162e+01 8.206e+01 8.903e+01 9.867e+01 1.464e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-20 13:55:36,337 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2023-11-20 13:55:43,406 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 166950 2023-11-20 13:55:55,858 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 10650, loss[loss=0.07809, simple_loss=0.1001, pruned_loss=0.02036, audio_tagging_loss=0.007701, over 15305.00 frames. ], tot_loss[loss=0.07876, simple_loss=0.1001, pruned_loss=0.01889, audio_tagging_loss=0.009814, over 3041078.45 frames. ], batch size: 56, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:56:02,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1113046.6666666667, ans=0.2 2023-11-20 13:56:25,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=12.0 2023-11-20 13:56:48,708 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 167000 2023-11-20 13:56:54,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1113313.3333333333, ans=0.125 2023-11-20 13:57:00,527 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 10700, loss[loss=0.08157, simple_loss=0.1056, pruned_loss=0.01844, audio_tagging_loss=0.01032, over 15482.00 frames. ], tot_loss[loss=0.07941, simple_loss=0.1009, pruned_loss=0.01916, audio_tagging_loss=0.009775, over 3048457.17 frames. ], batch size: 57, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:57:32,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1113513.3333333333, ans=0.0 2023-11-20 13:57:34,220 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.075e+01 8.061e+01 8.803e+01 9.456e+01 1.141e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-20 13:57:47,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1113580.0, ans=0.125 2023-11-20 13:57:53,030 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2023-11-20 13:57:53,756 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 167050 2023-11-20 13:58:05,327 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 10750, loss[loss=0.08933, simple_loss=0.1072, pruned_loss=0.0246, audio_tagging_loss=0.01115, over 14814.00 frames. ], tot_loss[loss=0.07955, simple_loss=0.1011, pruned_loss=0.01925, audio_tagging_loss=0.009754, over 3047010.10 frames. ], batch size: 55, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:58:25,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1113780.0, ans=0.125 2023-11-20 13:58:30,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1113846.6666666667, ans=0.125 2023-11-20 13:58:31,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1113846.6666666667, ans=0.0 2023-11-20 13:58:57,871 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 167100 2023-11-20 13:59:03,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1113980.0, ans=0.0 2023-11-20 13:59:09,717 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 10800, loss[loss=0.08176, simple_loss=0.09783, pruned_loss=0.02234, audio_tagging_loss=0.0105, over 16379.00 frames. ], tot_loss[loss=0.07982, simple_loss=0.1014, pruned_loss=0.01938, audio_tagging_loss=0.009724, over 3043488.72 frames. ], batch size: 60, lr: 4.86e-03, grad_scale: 32.0 2023-11-20 13:59:09,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1114046.6666666667, ans=0.0 2023-11-20 13:59:19,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1114046.6666666667, ans=0.1 2023-11-20 13:59:34,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=1114180.0, ans=0.2 2023-11-20 13:59:43,618 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.204e+01 8.350e+01 8.974e+01 9.650e+01 1.251e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-20 13:59:56,802 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.20 vs. limit=22.5 2023-11-20 14:00:03,058 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 167150 2023-11-20 14:00:04,924 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2023-11-20 14:00:07,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1114313.3333333333, ans=0.125 2023-11-20 14:00:14,907 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 10850, loss[loss=0.08516, simple_loss=0.1035, pruned_loss=0.02284, audio_tagging_loss=0.01057, over 15532.00 frames. ], tot_loss[loss=0.07961, simple_loss=0.1009, pruned_loss=0.01942, audio_tagging_loss=0.009734, over 3045165.13 frames. ], batch size: 58, lr: 4.86e-03, grad_scale: 32.0 2023-11-20 14:00:50,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1114513.3333333333, ans=0.125 2023-11-20 14:00:58,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1114580.0, ans=0.1 2023-11-20 14:01:08,171 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 167200 2023-11-20 14:01:14,748 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:01:20,255 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 10900, loss[loss=0.06966, simple_loss=0.08525, pruned_loss=0.0161, audio_tagging_loss=0.01092, over 15180.00 frames. ], tot_loss[loss=0.07883, simple_loss=0.09986, pruned_loss=0.01904, audio_tagging_loss=0.00986, over 3047883.98 frames. ], batch size: 59, lr: 4.86e-03, grad_scale: 32.0 2023-11-20 14:01:20,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1114713.3333333333, ans=0.1 2023-11-20 14:01:32,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1114780.0, ans=0.09899494936611666 2023-11-20 14:01:47,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1114846.6666666667, ans=0.125 2023-11-20 14:01:53,715 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.172e+01 8.152e+01 8.794e+01 9.597e+01 1.232e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 14:01:58,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1114913.3333333333, ans=0.125 2023-11-20 14:02:13,369 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 167250 2023-11-20 14:02:24,245 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 10950, loss[loss=0.06389, simple_loss=0.08274, pruned_loss=0.01405, audio_tagging_loss=0.008466, over 14497.00 frames. ], tot_loss[loss=0.07868, simple_loss=0.09937, pruned_loss=0.01905, audio_tagging_loss=0.009941, over 3049961.97 frames. ], batch size: 56, lr: 4.86e-03, grad_scale: 32.0 2023-11-20 14:02:29,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1115046.6666666667, ans=0.1 2023-11-20 14:02:45,237 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=15.27 vs. limit=15.0 2023-11-20 14:02:59,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1115180.0, ans=0.125 2023-11-20 14:03:01,437 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.33 vs. limit=15.0 2023-11-20 14:03:06,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1115246.6666666667, ans=0.125 2023-11-20 14:03:16,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=22.5 2023-11-20 14:03:17,733 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 167300 2023-11-20 14:03:21,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1115313.3333333333, ans=0.125 2023-11-20 14:03:25,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1115313.3333333333, ans=0.1 2023-11-20 14:03:29,242 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 11000, loss[loss=0.07655, simple_loss=0.1002, pruned_loss=0.01733, audio_tagging_loss=0.009112, over 15132.00 frames. ], tot_loss[loss=0.07817, simple_loss=0.0988, pruned_loss=0.01875, audio_tagging_loss=0.01002, over 3055782.90 frames. ], batch size: 58, lr: 4.86e-03, grad_scale: 32.0 2023-11-20 14:03:38,507 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:03:41,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1115446.6666666667, ans=0.2 2023-11-20 14:03:52,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1115446.6666666667, ans=0.0 2023-11-20 14:04:02,044 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.939e+01 8.121e+01 8.892e+01 9.815e+01 1.453e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-20 14:04:13,228 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-11-20 14:04:22,133 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 167350 2023-11-20 14:04:33,153 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 11050, loss[loss=0.06574, simple_loss=0.07915, pruned_loss=0.01625, audio_tagging_loss=0.009923, over 15248.00 frames. ], tot_loss[loss=0.07838, simple_loss=0.09912, pruned_loss=0.01877, audio_tagging_loss=0.01005, over 3049873.85 frames. ], batch size: 56, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:04:54,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1115780.0, ans=0.5 2023-11-20 14:05:07,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1115846.6666666667, ans=0.0 2023-11-20 14:05:11,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1115913.3333333333, ans=0.0 2023-11-20 14:05:12,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1115913.3333333333, ans=0.2 2023-11-20 14:05:25,983 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=15.0 2023-11-20 14:05:26,762 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 167400 2023-11-20 14:05:26,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1115980.0, ans=0.0 2023-11-20 14:05:28,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1115980.0, ans=0.125 2023-11-20 14:05:33,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1115980.0, ans=0.125 2023-11-20 14:05:35,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1115980.0, ans=0.2 2023-11-20 14:05:38,014 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 11100, loss[loss=0.06956, simple_loss=0.08693, pruned_loss=0.01737, audio_tagging_loss=0.008728, over 14176.00 frames. ], tot_loss[loss=0.07903, simple_loss=0.0999, pruned_loss=0.01894, audio_tagging_loss=0.01015, over 3051068.28 frames. ], batch size: 56, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:06:04,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1116180.0, ans=0.2 2023-11-20 14:06:12,000 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.063e+01 8.381e+01 8.919e+01 9.708e+01 1.297e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-20 14:06:18,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.84 vs. limit=15.0 2023-11-20 14:06:25,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1116246.6666666667, ans=0.2 2023-11-20 14:06:30,509 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2023-11-20 14:06:31,695 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 167450 2023-11-20 14:06:42,824 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 11150, loss[loss=0.07905, simple_loss=0.07799, pruned_loss=0.02338, audio_tagging_loss=0.01668, over 13813.00 frames. ], tot_loss[loss=0.07931, simple_loss=0.1001, pruned_loss=0.01903, audio_tagging_loss=0.01025, over 3050230.45 frames. ], batch size: 53, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:06:45,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1116380.0, ans=0.125 2023-11-20 14:06:48,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1116380.0, ans=0.1 2023-11-20 14:06:55,896 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.79 vs. limit=15.0 2023-11-20 14:07:13,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2023-11-20 14:07:16,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1116513.3333333333, ans=0.0 2023-11-20 14:07:16,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1116513.3333333333, ans=0.125 2023-11-20 14:07:30,908 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.11 vs. limit=10.0 2023-11-20 14:07:35,332 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 167500 2023-11-20 14:07:44,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1116646.6666666667, ans=0.125 2023-11-20 14:07:47,513 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 11200, loss[loss=0.08518, simple_loss=0.1044, pruned_loss=0.02478, audio_tagging_loss=0.008223, over 15418.00 frames. ], tot_loss[loss=0.07893, simple_loss=0.09959, pruned_loss=0.01889, audio_tagging_loss=0.01025, over 3045639.57 frames. ], batch size: 57, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:07:49,779 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.50 vs. limit=5.0 2023-11-20 14:08:10,935 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:08:12,469 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=22.5 2023-11-20 14:08:19,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1116846.6666666667, ans=0.1 2023-11-20 14:08:20,251 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.196e+01 8.773e+01 9.585e+01 1.271e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-20 14:08:37,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1116980.0, ans=0.0 2023-11-20 14:08:38,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1116980.0, ans=0.125 2023-11-20 14:08:40,460 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 167550 2023-11-20 14:08:51,240 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 11250, loss[loss=0.05968, simple_loss=0.07824, pruned_loss=0.01004, audio_tagging_loss=0.01052, over 14917.00 frames. ], tot_loss[loss=0.07829, simple_loss=0.09878, pruned_loss=0.01869, audio_tagging_loss=0.01021, over 3046148.05 frames. ], batch size: 56, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:09:05,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1117113.3333333333, ans=0.035 2023-11-20 14:09:21,440 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=15.0 2023-11-20 14:09:27,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1117180.0, ans=0.035 2023-11-20 14:09:44,033 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 167600 2023-11-20 14:09:52,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1117313.3333333333, ans=0.125 2023-11-20 14:09:55,738 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 11300, loss[loss=0.09268, simple_loss=0.1174, pruned_loss=0.02201, audio_tagging_loss=0.01196, over 15541.00 frames. ], tot_loss[loss=0.07807, simple_loss=0.0988, pruned_loss=0.0186, audio_tagging_loss=0.01007, over 3048669.09 frames. ], batch size: 61, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:10:04,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1117380.0, ans=0.1 2023-11-20 14:10:06,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1117380.0, ans=0.1 2023-11-20 14:10:30,800 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.679e+01 8.103e+01 8.654e+01 9.341e+01 1.359e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-20 14:10:40,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1117580.0, ans=0.0 2023-11-20 14:10:42,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1117580.0, ans=0.0 2023-11-20 14:10:47,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1117646.6666666667, ans=0.1 2023-11-20 14:10:48,696 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 167650 2023-11-20 14:10:54,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1117646.6666666667, ans=0.125 2023-11-20 14:11:00,279 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 11350, loss[loss=0.07211, simple_loss=0.09488, pruned_loss=0.016, audio_tagging_loss=0.008662, over 15673.00 frames. ], tot_loss[loss=0.07798, simple_loss=0.0986, pruned_loss=0.01871, audio_tagging_loss=0.009973, over 3045431.87 frames. ], batch size: 55, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:11:11,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1117713.3333333333, ans=0.125 2023-11-20 14:11:12,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1117780.0, ans=0.0 2023-11-20 14:11:40,686 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2023-11-20 14:11:52,926 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 167700 2023-11-20 14:11:53,041 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:12:04,783 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 11400, loss[loss=0.07871, simple_loss=0.0948, pruned_loss=0.018, audio_tagging_loss=0.01331, over 14562.00 frames. ], tot_loss[loss=0.07812, simple_loss=0.09886, pruned_loss=0.01874, audio_tagging_loss=0.009948, over 3044352.02 frames. ], batch size: 56, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:12:06,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1118046.6666666667, ans=0.0 2023-11-20 14:12:07,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1118046.6666666667, ans=0.125 2023-11-20 14:12:14,921 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=15.0 2023-11-20 14:12:20,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1118113.3333333333, ans=0.125 2023-11-20 14:12:24,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2023-11-20 14:12:30,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2023-11-20 14:12:32,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1118180.0, ans=0.0 2023-11-20 14:12:39,411 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.780e+01 8.090e+01 8.832e+01 9.724e+01 2.021e+02, threshold=1.766e+02, percent-clipped=1.0 2023-11-20 14:12:42,465 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.48 vs. limit=10.0 2023-11-20 14:12:49,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1118246.6666666667, ans=0.0 2023-11-20 14:12:57,885 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 167750 2023-11-20 14:13:07,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1118313.3333333333, ans=0.0 2023-11-20 14:13:09,443 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 11450, loss[loss=0.05254, simple_loss=0.06972, pruned_loss=0.01036, audio_tagging_loss=0.007326, over 14052.00 frames. ], tot_loss[loss=0.07873, simple_loss=0.09993, pruned_loss=0.019, audio_tagging_loss=0.009761, over 3050624.85 frames. ], batch size: 57, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:13:24,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1118446.6666666667, ans=0.125 2023-11-20 14:13:41,465 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.73 vs. limit=15.0 2023-11-20 14:13:47,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1118580.0, ans=0.0 2023-11-20 14:13:53,244 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.77 vs. limit=6.0 2023-11-20 14:14:02,127 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 167800 2023-11-20 14:14:07,755 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2023-11-20 14:14:13,379 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.92 vs. limit=6.0 2023-11-20 14:14:14,037 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 11500, loss[loss=0.06687, simple_loss=0.08817, pruned_loss=0.01248, audio_tagging_loss=0.01031, over 14461.00 frames. ], tot_loss[loss=0.07818, simple_loss=0.09924, pruned_loss=0.01877, audio_tagging_loss=0.00979, over 3051350.76 frames. ], batch size: 55, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:14:35,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1118780.0, ans=0.125 2023-11-20 14:14:36,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1118780.0, ans=0.04949747468305833 2023-11-20 14:14:48,906 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.104e+01 8.299e+01 8.769e+01 9.853e+01 1.208e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 14:14:51,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1118913.3333333333, ans=0.125 2023-11-20 14:14:57,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1118913.3333333333, ans=0.125 2023-11-20 14:15:06,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1118980.0, ans=0.09899494936611666 2023-11-20 14:15:07,052 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 167850 2023-11-20 14:15:07,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1118980.0, ans=0.95 2023-11-20 14:15:07,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1118980.0, ans=0.2 2023-11-20 14:15:17,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1119046.6666666667, ans=0.125 2023-11-20 14:15:19,091 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 11550, loss[loss=0.06842, simple_loss=0.08627, pruned_loss=0.01558, audio_tagging_loss=0.0097, over 14931.00 frames. ], tot_loss[loss=0.07846, simple_loss=0.0995, pruned_loss=0.01895, audio_tagging_loss=0.009758, over 3052657.36 frames. ], batch size: 56, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:15:48,524 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2023-11-20 14:15:55,717 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:16:08,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1119313.3333333333, ans=0.0 2023-11-20 14:16:11,590 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 167900 2023-11-20 14:16:13,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1119313.3333333333, ans=0.05 2023-11-20 14:16:23,310 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 11600, loss[loss=0.09234, simple_loss=0.1151, pruned_loss=0.025, audio_tagging_loss=0.009767, over 14303.00 frames. ], tot_loss[loss=0.07905, simple_loss=0.1003, pruned_loss=0.01911, audio_tagging_loss=0.009789, over 3049943.37 frames. ], batch size: 54, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:16:28,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1119380.0, ans=0.04949747468305833 2023-11-20 14:16:32,649 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.96 vs. limit=10.0 2023-11-20 14:16:39,335 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.16 vs. limit=12.0 2023-11-20 14:16:57,521 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.656e+01 8.039e+01 8.649e+01 9.262e+01 1.367e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-20 14:16:59,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1119513.3333333333, ans=0.125 2023-11-20 14:17:15,920 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 167950 2023-11-20 14:17:26,986 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 11650, loss[loss=0.08474, simple_loss=0.1081, pruned_loss=0.01946, audio_tagging_loss=0.01121, over 15144.00 frames. ], tot_loss[loss=0.07856, simple_loss=0.09972, pruned_loss=0.01888, audio_tagging_loss=0.009819, over 3058079.96 frames. ], batch size: 56, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:17:36,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1119713.3333333333, ans=0.0 2023-11-20 14:17:47,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1119780.0, ans=0.125 2023-11-20 14:17:55,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1119846.6666666667, ans=0.125 2023-11-20 14:18:01,782 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.65 vs. limit=22.5 2023-11-20 14:18:10,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1119913.3333333333, ans=0.125 2023-11-20 14:18:12,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1119913.3333333333, ans=0.125 2023-11-20 14:18:20,052 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 168000 2023-11-20 14:18:26,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1119980.0, ans=0.1 2023-11-20 14:18:32,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1119980.0, ans=0.0 2023-11-20 14:18:34,420 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.88 vs. limit=10.0 2023-11-20 14:18:34,843 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 11700, loss[loss=0.09282, simple_loss=0.115, pruned_loss=0.02469, audio_tagging_loss=0.0106, over 14636.00 frames. ], tot_loss[loss=0.07917, simple_loss=0.1005, pruned_loss=0.01902, audio_tagging_loss=0.009901, over 3054105.15 frames. ], batch size: 58, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:18:43,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1120046.6666666667, ans=0.125 2023-11-20 14:18:45,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1120046.6666666667, ans=0.0 2023-11-20 14:18:59,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1120180.0, ans=0.125 2023-11-20 14:19:09,479 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.797e+01 8.107e+01 8.645e+01 9.352e+01 1.111e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-20 14:19:09,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1120180.0, ans=0.125 2023-11-20 14:19:18,863 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2023-11-20 14:19:27,340 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 168050 2023-11-20 14:19:28,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=1120313.3333333333, ans=0.02 2023-11-20 14:19:32,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1120313.3333333333, ans=10.0 2023-11-20 14:19:39,534 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 11750, loss[loss=0.0929, simple_loss=0.1205, pruned_loss=0.02525, audio_tagging_loss=0.007404, over 16043.00 frames. ], tot_loss[loss=0.07968, simple_loss=0.1009, pruned_loss=0.01933, audio_tagging_loss=0.009915, over 3051721.50 frames. ], batch size: 61, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:20:08,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1120513.3333333333, ans=0.015 2023-11-20 14:20:32,719 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 168100 2023-11-20 14:20:34,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1120646.6666666667, ans=0.125 2023-11-20 14:20:43,432 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 11800, loss[loss=0.0631, simple_loss=0.07881, pruned_loss=0.01492, audio_tagging_loss=0.008782, over 14461.00 frames. ], tot_loss[loss=0.07951, simple_loss=0.1005, pruned_loss=0.01934, audio_tagging_loss=0.009916, over 3052075.68 frames. ], batch size: 57, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:21:19,049 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.990e+01 8.087e+01 8.933e+01 9.931e+01 1.196e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 14:21:22,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1120913.3333333333, ans=0.125 2023-11-20 14:21:24,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1120913.3333333333, ans=0.125 2023-11-20 14:21:30,692 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=15.0 2023-11-20 14:21:36,603 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 168150 2023-11-20 14:21:40,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1120980.0, ans=0.125 2023-11-20 14:21:47,567 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 11850, loss[loss=0.07838, simple_loss=0.09528, pruned_loss=0.01978, audio_tagging_loss=0.01096, over 14665.00 frames. ], tot_loss[loss=0.07985, simple_loss=0.101, pruned_loss=0.0194, audio_tagging_loss=0.009966, over 3049079.30 frames. ], batch size: 56, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:21:55,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1121046.6666666667, ans=0.1 2023-11-20 14:22:26,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1121246.6666666667, ans=0.125 2023-11-20 14:22:40,217 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 168200 2023-11-20 14:22:43,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1121313.3333333333, ans=0.1 2023-11-20 14:22:49,998 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.87 vs. limit=15.0 2023-11-20 14:22:51,444 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 11900, loss[loss=0.08863, simple_loss=0.1018, pruned_loss=0.02506, audio_tagging_loss=0.01265, over 14082.00 frames. ], tot_loss[loss=0.07939, simple_loss=0.09998, pruned_loss=0.01927, audio_tagging_loss=0.01013, over 3038717.73 frames. ], batch size: 53, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:22:51,946 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2023-11-20 14:23:19,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1121513.3333333333, ans=0.125 2023-11-20 14:23:25,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1121513.3333333333, ans=0.125 2023-11-20 14:23:25,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1121513.3333333333, ans=0.125 2023-11-20 14:23:27,249 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.121e+01 8.163e+01 8.778e+01 9.504e+01 1.300e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 14:23:33,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1121580.0, ans=0.0 2023-11-20 14:23:45,108 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 168250 2023-11-20 14:23:50,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1121646.6666666667, ans=0.1 2023-11-20 14:23:54,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1121646.6666666667, ans=0.125 2023-11-20 14:23:56,562 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 11950, loss[loss=0.05001, simple_loss=0.05718, pruned_loss=0.0105, audio_tagging_loss=0.01092, over 15436.00 frames. ], tot_loss[loss=0.07995, simple_loss=0.1009, pruned_loss=0.01932, audio_tagging_loss=0.01018, over 3041723.06 frames. ], batch size: 59, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:24:19,194 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.85 vs. limit=12.0 2023-11-20 14:24:48,381 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 168300 2023-11-20 14:24:48,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1121980.0, ans=0.125 2023-11-20 14:24:58,992 INFO [train_asr.py:1262] (3/4) Epoch 14, batch 12000, loss[loss=0.06803, simple_loss=0.0837, pruned_loss=0.01629, audio_tagging_loss=0.009884, over 14019.00 frames. ], tot_loss[loss=0.08009, simple_loss=0.101, pruned_loss=0.01944, audio_tagging_loss=0.01014, over 3037540.30 frames. ], batch size: 55, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:24:58,992 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-20 14:25:41,034 INFO [train_asr.py:1294] (3/4) Epoch 14, validation: loss=0.06236, simple_loss=0.05348, pruned_loss=0.005638, audio_tagging_loss=0.02999, over 4681554.00 frames. 2023-11-20 14:25:41,035 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-20 14:26:46,242 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 0, loss[loss=0.1002, simple_loss=0.1183, pruned_loss=0.02247, audio_tagging_loss=0.01856, over 15251.00 frames. ], tot_loss[loss=0.1002, simple_loss=0.1183, pruned_loss=0.02247, audio_tagging_loss=0.01856, over 15251.00 frames. ], batch size: 55, lr: 4.68e-03, grad_scale: 32.0 2023-11-20 14:26:46,243 INFO [train_asr.py:1285] (3/4) Computing validation loss 2023-11-20 14:27:01,926 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.6979, 5.6757, 5.8107, 5.7717], device='cuda:3') 2023-11-20 14:27:18,135 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3104, 4.9829, 4.7216, 5.1765], device='cuda:3') 2023-11-20 14:27:21,773 INFO [train_asr.py:1294] (3/4) Epoch 15, validation: loss=0.06153, simple_loss=0.05347, pruned_loss=0.005654, audio_tagging_loss=0.02914, over 4681554.00 frames. 2023-11-20 14:27:21,774 INFO [train_asr.py:1295] (3/4) Maximum memory allocated so far is 25886MB 2023-11-20 14:27:26,690 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.292e+01 9.006e+01 9.902e+01 1.226e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-20 14:27:30,940 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.67 vs. limit=15.0 2023-11-20 14:27:35,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1122266.6666666667, ans=0.125 2023-11-20 14:27:44,715 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 168350 2023-11-20 14:27:52,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1122333.3333333333, ans=0.1 2023-11-20 14:28:12,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1122466.6666666667, ans=0.125 2023-11-20 14:28:22,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1122466.6666666667, ans=0.125 2023-11-20 14:28:24,157 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2023-11-20 14:28:25,996 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 50, loss[loss=0.06852, simple_loss=0.07221, pruned_loss=0.01192, audio_tagging_loss=0.0205, over 14390.00 frames. ], tot_loss[loss=0.09008, simple_loss=0.1025, pruned_loss=0.01999, audio_tagging_loss=0.01884, over 690218.93 frames. ], batch size: 55, lr: 4.67e-03, grad_scale: 32.0 2023-11-20 14:28:50,228 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 168400 2023-11-20 14:28:56,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1122666.6666666667, ans=0.2 2023-11-20 14:29:02,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1122666.6666666667, ans=0.125 2023-11-20 14:29:03,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1122666.6666666667, ans=0.125 2023-11-20 14:29:24,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=22.5 2023-11-20 14:29:32,379 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 100, loss[loss=0.101, simple_loss=0.1204, pruned_loss=0.02635, audio_tagging_loss=0.01443, over 14842.00 frames. ], tot_loss[loss=0.0881, simple_loss=0.1005, pruned_loss=0.01952, audio_tagging_loss=0.01832, over 1215326.34 frames. ], batch size: 54, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:29:36,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1122866.6666666667, ans=0.0 2023-11-20 14:29:39,160 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 8.769e+01 9.395e+01 1.004e+02 1.341e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-20 14:29:40,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.57 vs. limit=15.0 2023-11-20 14:29:56,009 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 168450 2023-11-20 14:30:25,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.15 vs. limit=15.0 2023-11-20 14:30:27,871 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.19 vs. limit=12.0 2023-11-20 14:30:32,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1123133.3333333333, ans=0.125 2023-11-20 14:30:37,471 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 150, loss[loss=0.1053, simple_loss=0.1345, pruned_loss=0.02782, audio_tagging_loss=0.01019, over 15468.00 frames. ], tot_loss[loss=0.08542, simple_loss=0.09947, pruned_loss=0.01905, audio_tagging_loss=0.01663, over 1620257.23 frames. ], batch size: 57, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:30:48,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1123200.0, ans=0.125 2023-11-20 14:30:51,287 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.99 vs. limit=15.0 2023-11-20 14:30:58,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1123266.6666666667, ans=0.0 2023-11-20 14:31:00,985 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 168500 2023-11-20 14:31:06,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1123333.3333333333, ans=0.125 2023-11-20 14:31:31,453 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2023-11-20 14:31:33,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-20 14:31:42,723 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 200, loss[loss=0.05317, simple_loss=0.06356, pruned_loss=0.0122, audio_tagging_loss=0.009187, over 15379.00 frames. ], tot_loss[loss=0.08352, simple_loss=0.09961, pruned_loss=0.01912, audio_tagging_loss=0.0146, over 1930296.22 frames. ], batch size: 59, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:31:46,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1123533.3333333333, ans=0.0 2023-11-20 14:31:48,818 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.970e+01 8.233e+01 8.956e+01 9.883e+01 1.318e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-20 14:31:58,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1123600.0, ans=0.125 2023-11-20 14:32:06,111 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 168550 2023-11-20 14:32:12,083 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-11-20 14:32:35,574 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=15.0 2023-11-20 14:32:48,631 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 250, loss[loss=0.09656, simple_loss=0.1242, pruned_loss=0.02553, audio_tagging_loss=0.008911, over 16360.00 frames. ], tot_loss[loss=0.08299, simple_loss=0.1007, pruned_loss=0.01947, audio_tagging_loss=0.01318, over 2178424.26 frames. ], batch size: 59, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:32:58,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1123866.6666666667, ans=0.125 2023-11-20 14:33:11,764 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 168600 2023-11-20 14:33:15,030 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=22.5 2023-11-20 14:33:26,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1124066.6666666667, ans=0.0 2023-11-20 14:33:29,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1124066.6666666667, ans=0.0 2023-11-20 14:33:42,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1124133.3333333333, ans=0.0 2023-11-20 14:33:44,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1124133.3333333333, ans=0.125 2023-11-20 14:33:53,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1124200.0, ans=0.0 2023-11-20 14:33:53,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1124200.0, ans=0.125 2023-11-20 14:33:54,389 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 300, loss[loss=0.06378, simple_loss=0.08662, pruned_loss=0.01323, audio_tagging_loss=0.007231, over 15904.00 frames. ], tot_loss[loss=0.08178, simple_loss=0.1003, pruned_loss=0.01942, audio_tagging_loss=0.01223, over 2369471.18 frames. ], batch size: 58, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:34:00,423 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.977e+01 8.500e+01 9.120e+01 9.945e+01 1.401e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-20 14:34:01,909 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:34:02,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1124200.0, ans=0.0 2023-11-20 14:34:04,863 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.31 vs. limit=15.0 2023-11-20 14:34:17,713 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 168650 2023-11-20 14:34:48,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1124466.6666666667, ans=0.1 2023-11-20 14:34:59,590 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 350, loss[loss=0.1129, simple_loss=0.1365, pruned_loss=0.03451, audio_tagging_loss=0.01015, over 15229.00 frames. ], tot_loss[loss=0.08079, simple_loss=0.1003, pruned_loss=0.01917, audio_tagging_loss=0.01145, over 2526344.93 frames. ], batch size: 56, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:34:59,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1124533.3333333333, ans=0.125 2023-11-20 14:35:22,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1124600.0, ans=0.5 2023-11-20 14:35:24,794 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 168700 2023-11-20 14:35:54,445 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.16 vs. limit=22.5 2023-11-20 14:36:02,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=1124800.0, ans=0.02 2023-11-20 14:36:03,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1124800.0, ans=0.0 2023-11-20 14:36:06,969 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 400, loss[loss=0.08469, simple_loss=0.1115, pruned_loss=0.01911, audio_tagging_loss=0.009829, over 15416.00 frames. ], tot_loss[loss=0.08113, simple_loss=0.1016, pruned_loss=0.01938, audio_tagging_loss=0.01097, over 2643329.14 frames. ], batch size: 58, lr: 4.67e-03, grad_scale: 32.0 2023-11-20 14:36:13,872 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.381e+01 8.162e+01 9.229e+01 1.065e+02 1.239e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-20 14:36:15,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1124866.6666666667, ans=0.0 2023-11-20 14:36:30,693 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 168750 2023-11-20 14:36:37,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1125000.0, ans=0.125 2023-11-20 14:36:45,693 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2023-11-20 14:36:48,216 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2023-11-20 14:37:06,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1125133.3333333333, ans=0.0 2023-11-20 14:37:12,766 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 450, loss[loss=0.06571, simple_loss=0.07629, pruned_loss=0.01695, audio_tagging_loss=0.01062, over 15322.00 frames. ], tot_loss[loss=0.08043, simple_loss=0.1007, pruned_loss=0.01934, audio_tagging_loss=0.01072, over 2732960.94 frames. ], batch size: 59, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:37:21,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1125200.0, ans=0.0 2023-11-20 14:37:29,298 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2023-11-20 14:37:34,860 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 168800 2023-11-20 14:38:15,721 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.25 vs. limit=15.0 2023-11-20 14:38:17,232 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 500, loss[loss=0.0904, simple_loss=0.1186, pruned_loss=0.02076, audio_tagging_loss=0.01034, over 15537.00 frames. ], tot_loss[loss=0.08064, simple_loss=0.1014, pruned_loss=0.01943, audio_tagging_loss=0.01051, over 2804405.40 frames. ], batch size: 58, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:38:19,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1125533.3333333333, ans=0.125 2023-11-20 14:38:24,589 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.837e+01 8.018e+01 8.483e+01 9.528e+01 1.143e+02, threshold=1.697e+02, percent-clipped=0.0 2023-11-20 14:38:32,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1125600.0, ans=0.0 2023-11-20 14:38:41,280 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 168850 2023-11-20 14:39:01,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=1125733.3333333333, ans=0.2 2023-11-20 14:39:05,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1125733.3333333333, ans=0.0 2023-11-20 14:39:05,652 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.45 vs. limit=22.5 2023-11-20 14:39:06,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1125733.3333333333, ans=0.0 2023-11-20 14:39:06,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1125733.3333333333, ans=0.0 2023-11-20 14:39:09,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1125800.0, ans=0.1 2023-11-20 14:39:21,973 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 550, loss[loss=0.08269, simple_loss=0.1101, pruned_loss=0.01818, audio_tagging_loss=0.009442, over 15087.00 frames. ], tot_loss[loss=0.08009, simple_loss=0.101, pruned_loss=0.01926, audio_tagging_loss=0.01034, over 2854692.39 frames. ], batch size: 56, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:39:23,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1125866.6666666667, ans=0.125 2023-11-20 14:39:25,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1125866.6666666667, ans=0.0 2023-11-20 14:39:45,567 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 168900 2023-11-20 14:39:47,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1126000.0, ans=0.0 2023-11-20 14:40:03,141 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.24 vs. limit=22.5 2023-11-20 14:40:19,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1126133.3333333333, ans=0.0 2023-11-20 14:40:24,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1126133.3333333333, ans=0.0 2023-11-20 14:40:27,412 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 600, loss[loss=0.08603, simple_loss=0.1116, pruned_loss=0.02189, audio_tagging_loss=0.008355, over 16119.00 frames. ], tot_loss[loss=0.08003, simple_loss=0.1008, pruned_loss=0.01936, audio_tagging_loss=0.01026, over 2894396.71 frames. ], batch size: 57, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:40:31,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1126200.0, ans=0.2 2023-11-20 14:40:35,042 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.140e+01 8.992e+01 9.843e+01 1.226e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-20 14:40:44,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1126266.6666666667, ans=0.0 2023-11-20 14:40:50,167 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 168950 2023-11-20 14:40:50,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1126266.6666666667, ans=0.2 2023-11-20 14:41:32,772 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 650, loss[loss=0.09245, simple_loss=0.1155, pruned_loss=0.02308, audio_tagging_loss=0.0116, over 16579.00 frames. ], tot_loss[loss=0.08003, simple_loss=0.101, pruned_loss=0.01928, audio_tagging_loss=0.01025, over 2929661.65 frames. ], batch size: 60, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:41:33,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1126533.3333333333, ans=0.125 2023-11-20 14:41:39,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1126533.3333333333, ans=0.1 2023-11-20 14:41:40,839 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=22.5 2023-11-20 14:41:44,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1126600.0, ans=0.2 2023-11-20 14:41:49,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1126600.0, ans=0.0 2023-11-20 14:41:57,347 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 169000 2023-11-20 14:42:01,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1126666.6666666667, ans=0.0 2023-11-20 14:42:07,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1126666.6666666667, ans=0.5 2023-11-20 14:42:10,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1126666.6666666667, ans=0.5 2023-11-20 14:42:21,415 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.93 vs. limit=15.0 2023-11-20 14:42:22,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1126733.3333333333, ans=0.0 2023-11-20 14:42:38,534 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 700, loss[loss=0.0994, simple_loss=0.1238, pruned_loss=0.02858, audio_tagging_loss=0.008916, over 14427.00 frames. ], tot_loss[loss=0.07967, simple_loss=0.1008, pruned_loss=0.01906, audio_tagging_loss=0.0102, over 2962144.29 frames. ], batch size: 55, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:42:47,988 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.688e+01 8.098e+01 8.725e+01 9.382e+01 1.189e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-20 14:42:54,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1126933.3333333333, ans=0.0 2023-11-20 14:43:01,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1126933.3333333333, ans=0.125 2023-11-20 14:43:03,404 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 169050 2023-11-20 14:43:04,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1127000.0, ans=0.125 2023-11-20 14:43:10,513 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-20 14:43:13,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1127000.0, ans=0.5 2023-11-20 14:43:45,362 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 750, loss[loss=0.1038, simple_loss=0.1318, pruned_loss=0.02826, audio_tagging_loss=0.009652, over 15493.00 frames. ], tot_loss[loss=0.07962, simple_loss=0.1008, pruned_loss=0.01903, audio_tagging_loss=0.01018, over 2988320.92 frames. ], batch size: 56, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:43:52,890 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=15.0 2023-11-20 14:44:03,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1127266.6666666667, ans=0.1 2023-11-20 14:44:05,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1127266.6666666667, ans=0.125 2023-11-20 14:44:08,672 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 169100 2023-11-20 14:44:08,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1127266.6666666667, ans=0.125 2023-11-20 14:44:17,804 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=15.0 2023-11-20 14:44:21,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1127333.3333333333, ans=0.125 2023-11-20 14:44:23,608 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.52 vs. limit=5.0 2023-11-20 14:44:23,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1127400.0, ans=0.125 2023-11-20 14:44:24,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1127400.0, ans=0.1 2023-11-20 14:44:37,863 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.71 vs. limit=15.0 2023-11-20 14:44:50,635 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 800, loss[loss=0.08924, simple_loss=0.1058, pruned_loss=0.02621, audio_tagging_loss=0.01013, over 15715.00 frames. ], tot_loss[loss=0.07974, simple_loss=0.101, pruned_loss=0.01913, audio_tagging_loss=0.01008, over 3004081.93 frames. ], batch size: 59, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:44:54,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1127533.3333333333, ans=0.125 2023-11-20 14:44:55,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1127533.3333333333, ans=0.2 2023-11-20 14:44:57,930 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.496e+01 8.097e+01 8.575e+01 9.313e+01 1.221e+02, threshold=1.715e+02, percent-clipped=0.0 2023-11-20 14:45:00,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1127533.3333333333, ans=0.125 2023-11-20 14:45:13,845 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 169150 2023-11-20 14:45:18,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1127666.6666666667, ans=0.0 2023-11-20 14:45:39,444 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.94 vs. limit=12.0 2023-11-20 14:45:50,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1127800.0, ans=0.1 2023-11-20 14:45:56,214 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 850, loss[loss=0.07745, simple_loss=0.1094, pruned_loss=0.01656, audio_tagging_loss=0.006213, over 14754.00 frames. ], tot_loss[loss=0.07973, simple_loss=0.101, pruned_loss=0.01902, audio_tagging_loss=0.01021, over 3015796.50 frames. ], batch size: 55, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:46:00,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1127866.6666666667, ans=0.125 2023-11-20 14:46:09,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1127933.3333333333, ans=0.2 2023-11-20 14:46:21,146 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 169200 2023-11-20 14:46:22,843 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.63 vs. limit=15.0 2023-11-20 14:46:32,008 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2023-11-20 14:47:02,631 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 900, loss[loss=0.07668, simple_loss=0.09996, pruned_loss=0.01758, audio_tagging_loss=0.009121, over 15184.00 frames. ], tot_loss[loss=0.07878, simple_loss=0.0998, pruned_loss=0.01866, audio_tagging_loss=0.01022, over 3021829.11 frames. ], batch size: 56, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:47:11,333 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.656e+01 8.130e+01 8.827e+01 9.752e+01 1.444e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-20 14:47:26,428 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 169250 2023-11-20 14:47:44,644 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=15.0 2023-11-20 14:48:07,417 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 950, loss[loss=0.06209, simple_loss=0.08993, pruned_loss=0.009616, audio_tagging_loss=0.007509, over 16176.00 frames. ], tot_loss[loss=0.07886, simple_loss=0.1001, pruned_loss=0.01875, audio_tagging_loss=0.01007, over 3025863.60 frames. ], batch size: 58, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:48:09,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1128533.3333333333, ans=0.125 2023-11-20 14:48:11,821 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.93 vs. limit=22.5 2023-11-20 14:48:28,287 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.91 vs. limit=15.0 2023-11-20 14:48:29,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1128600.0, ans=0.0 2023-11-20 14:48:30,269 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 169300 2023-11-20 14:48:32,134 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.34 vs. limit=15.0 2023-11-20 14:48:32,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1128666.6666666667, ans=0.125 2023-11-20 14:48:45,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1128733.3333333333, ans=0.2 2023-11-20 14:48:47,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1128733.3333333333, ans=0.125 2023-11-20 14:48:54,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1128733.3333333333, ans=0.125 2023-11-20 14:48:54,613 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:49:07,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1128800.0, ans=0.0 2023-11-20 14:49:11,805 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 1000, loss[loss=0.07895, simple_loss=0.1047, pruned_loss=0.01989, audio_tagging_loss=0.006705, over 15445.00 frames. ], tot_loss[loss=0.07848, simple_loss=0.09931, pruned_loss=0.01885, audio_tagging_loss=0.009973, over 3027738.62 frames. ], batch size: 57, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:49:20,012 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.688e+01 8.251e+01 8.894e+01 9.437e+01 1.345e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 14:49:25,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1128933.3333333333, ans=0.125 2023-11-20 14:49:26,413 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.03 vs. limit=15.0 2023-11-20 14:49:35,955 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 169350 2023-11-20 14:49:40,318 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:49:44,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=1129000.0, ans=10.0 2023-11-20 14:50:00,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1129066.6666666667, ans=0.125 2023-11-20 14:50:02,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1129133.3333333333, ans=0.1 2023-11-20 14:50:17,335 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 1050, loss[loss=0.05485, simple_loss=0.06748, pruned_loss=0.01185, audio_tagging_loss=0.009263, over 15300.00 frames. ], tot_loss[loss=0.07864, simple_loss=0.0994, pruned_loss=0.01899, audio_tagging_loss=0.009957, over 3027581.16 frames. ], batch size: 59, lr: 4.66e-03, grad_scale: 16.0 2023-11-20 14:50:40,829 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 169400 2023-11-20 14:50:58,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1129400.0, ans=0.125 2023-11-20 14:51:00,185 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.26 vs. limit=22.5 2023-11-20 14:51:03,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1129400.0, ans=0.0 2023-11-20 14:51:23,878 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 1100, loss[loss=0.08548, simple_loss=0.1158, pruned_loss=0.02038, audio_tagging_loss=0.007207, over 15343.00 frames. ], tot_loss[loss=0.07859, simple_loss=0.09941, pruned_loss=0.01897, audio_tagging_loss=0.009915, over 3030663.32 frames. ], batch size: 60, lr: 4.66e-03, grad_scale: 16.0 2023-11-20 14:51:26,405 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:51:29,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1129533.3333333333, ans=0.1 2023-11-20 14:51:32,467 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.387e+01 8.402e+01 8.962e+01 9.739e+01 1.697e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-20 14:51:45,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1129600.0, ans=0.125 2023-11-20 14:51:47,044 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 169450 2023-11-20 14:51:50,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1129666.6666666667, ans=0.1 2023-11-20 14:51:59,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1129666.6666666667, ans=0.125 2023-11-20 14:52:29,243 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 1150, loss[loss=0.07749, simple_loss=0.1016, pruned_loss=0.01915, audio_tagging_loss=0.007567, over 15211.00 frames. ], tot_loss[loss=0.07841, simple_loss=0.09933, pruned_loss=0.01883, audio_tagging_loss=0.00991, over 3029901.75 frames. ], batch size: 59, lr: 4.66e-03, grad_scale: 16.0 2023-11-20 14:52:37,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1129866.6666666667, ans=0.0 2023-11-20 14:52:37,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1129866.6666666667, ans=0.05 2023-11-20 14:52:45,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1129933.3333333333, ans=0.0 2023-11-20 14:52:49,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1129933.3333333333, ans=0.1 2023-11-20 14:52:49,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1129933.3333333333, ans=0.1 2023-11-20 14:52:53,504 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 169500 2023-11-20 14:52:53,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1129933.3333333333, ans=0.125 2023-11-20 14:52:59,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1130000.0, ans=0.125 2023-11-20 14:53:35,175 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 1200, loss[loss=0.08624, simple_loss=0.1083, pruned_loss=0.02087, audio_tagging_loss=0.01121, over 15426.00 frames. ], tot_loss[loss=0.07862, simple_loss=0.09968, pruned_loss=0.01888, audio_tagging_loss=0.009898, over 3034519.44 frames. ], batch size: 57, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:53:44,434 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.469e+01 8.178e+01 8.897e+01 9.679e+01 1.493e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 14:53:57,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1130266.6666666667, ans=0.125 2023-11-20 14:53:58,868 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 169550 2023-11-20 14:53:59,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1130266.6666666667, ans=0.125 2023-11-20 14:54:07,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1130333.3333333333, ans=0.125 2023-11-20 14:54:12,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1130400.0, ans=0.125 2023-11-20 14:54:28,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1130466.6666666667, ans=0.125 2023-11-20 14:54:40,120 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 1250, loss[loss=0.09351, simple_loss=0.1203, pruned_loss=0.02178, audio_tagging_loss=0.01159, over 14821.00 frames. ], tot_loss[loss=0.07918, simple_loss=0.1001, pruned_loss=0.01928, audio_tagging_loss=0.009834, over 3041556.22 frames. ], batch size: 54, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:54:40,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1130533.3333333333, ans=0.125 2023-11-20 14:54:57,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1130600.0, ans=0.2 2023-11-20 14:55:03,160 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 169600 2023-11-20 14:55:19,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1130733.3333333333, ans=0.125 2023-11-20 14:55:23,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1130733.3333333333, ans=0.1 2023-11-20 14:55:44,765 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 1300, loss[loss=0.0977, simple_loss=0.1382, pruned_loss=0.0218, audio_tagging_loss=0.00678, over 15106.00 frames. ], tot_loss[loss=0.07888, simple_loss=0.1, pruned_loss=0.01909, audio_tagging_loss=0.009777, over 3037706.74 frames. ], batch size: 55, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:55:53,507 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.559e+01 8.274e+01 8.667e+01 1.016e+02 1.258e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 14:56:06,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1130933.3333333333, ans=0.1 2023-11-20 14:56:08,374 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 169650 2023-11-20 14:56:12,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1131000.0, ans=0.125 2023-11-20 14:56:18,076 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=12.0 2023-11-20 14:56:41,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.67 vs. limit=12.0 2023-11-20 14:56:49,815 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 1350, loss[loss=0.05904, simple_loss=0.06522, pruned_loss=0.01121, audio_tagging_loss=0.01521, over 15030.00 frames. ], tot_loss[loss=0.07786, simple_loss=0.09862, pruned_loss=0.01866, audio_tagging_loss=0.009887, over 3036976.23 frames. ], batch size: 56, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:57:07,491 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:57:12,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1131266.6666666667, ans=0.125 2023-11-20 14:57:13,574 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 169700 2023-11-20 14:57:14,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1131333.3333333333, ans=0.125 2023-11-20 14:57:17,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1131333.3333333333, ans=0.0 2023-11-20 14:57:36,953 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:57:50,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1131466.6666666667, ans=0.0 2023-11-20 14:57:55,474 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 1400, loss[loss=0.08614, simple_loss=0.1086, pruned_loss=0.02181, audio_tagging_loss=0.01004, over 15433.00 frames. ], tot_loss[loss=0.07849, simple_loss=0.09927, pruned_loss=0.01895, audio_tagging_loss=0.009911, over 3041411.80 frames. ], batch size: 57, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:58:04,236 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.814e+01 7.998e+01 8.583e+01 9.280e+01 1.349e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-20 14:58:15,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1131600.0, ans=0.1 2023-11-20 14:58:19,203 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 169750 2023-11-20 14:59:00,601 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 1450, loss[loss=0.07859, simple_loss=0.09753, pruned_loss=0.01989, audio_tagging_loss=0.00993, over 14844.00 frames. ], tot_loss[loss=0.07854, simple_loss=0.09914, pruned_loss=0.01897, audio_tagging_loss=0.009993, over 3039816.67 frames. ], batch size: 57, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:59:01,370 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2023-11-20 14:59:02,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1131866.6666666667, ans=0.2 2023-11-20 14:59:04,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1131866.6666666667, ans=0.125 2023-11-20 14:59:24,652 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 169800 2023-11-20 14:59:40,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1132066.6666666667, ans=0.125 2023-11-20 14:59:40,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1132066.6666666667, ans=0.0 2023-11-20 15:00:06,397 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 1500, loss[loss=0.0498, simple_loss=0.05146, pruned_loss=0.01121, audio_tagging_loss=0.01285, over 14207.00 frames. ], tot_loss[loss=0.07805, simple_loss=0.09811, pruned_loss=0.01885, audio_tagging_loss=0.01015, over 3034293.07 frames. ], batch size: 58, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 15:00:12,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1132200.0, ans=0.1 2023-11-20 15:00:17,038 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.270e+01 9.018e+01 9.743e+01 1.216e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-20 15:00:21,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1132266.6666666667, ans=0.125 2023-11-20 15:00:29,944 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 169850 2023-11-20 15:00:45,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1132400.0, ans=0.125 2023-11-20 15:00:56,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1132400.0, ans=0.2 2023-11-20 15:00:57,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1132466.6666666667, ans=0.5 2023-11-20 15:01:11,572 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 1550, loss[loss=0.08678, simple_loss=0.1182, pruned_loss=0.0214, audio_tagging_loss=0.006259, over 15142.00 frames. ], tot_loss[loss=0.07854, simple_loss=0.09898, pruned_loss=0.01898, audio_tagging_loss=0.01007, over 3037468.72 frames. ], batch size: 57, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 15:01:25,864 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.29 vs. limit=15.0 2023-11-20 15:01:34,449 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 169900 2023-11-20 15:01:34,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1132600.0, ans=0.035 2023-11-20 15:01:35,828 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 15:02:01,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1132733.3333333333, ans=0.125 2023-11-20 15:02:15,856 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 1600, loss[loss=0.07485, simple_loss=0.1047, pruned_loss=0.01384, audio_tagging_loss=0.008658, over 15408.00 frames. ], tot_loss[loss=0.07819, simple_loss=0.09844, pruned_loss=0.01884, audio_tagging_loss=0.01013, over 3039193.95 frames. ], batch size: 56, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:02:17,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1132866.6666666667, ans=10.0 2023-11-20 15:02:17,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1132866.6666666667, ans=0.125 2023-11-20 15:02:26,260 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.675e+01 8.383e+01 8.914e+01 9.693e+01 2.648e+02, threshold=1.783e+02, percent-clipped=1.0 2023-11-20 15:02:37,400 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.03 vs. limit=10.0 2023-11-20 15:02:39,772 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 169950 2023-11-20 15:03:14,052 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2023-11-20 15:03:20,756 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 1650, loss[loss=0.07719, simple_loss=0.09695, pruned_loss=0.0164, audio_tagging_loss=0.01232, over 16504.00 frames. ], tot_loss[loss=0.07803, simple_loss=0.09802, pruned_loss=0.0188, audio_tagging_loss=0.01022, over 3038511.75 frames. ], batch size: 61, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:03:32,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1133266.6666666667, ans=0.125 2023-11-20 15:03:34,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1133266.6666666667, ans=0.125 2023-11-20 15:03:43,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1133266.6666666667, ans=0.125 2023-11-20 15:03:44,547 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 170000 2023-11-20 15:03:57,903 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=12.0 2023-11-20 15:04:14,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1133466.6666666667, ans=0.125 2023-11-20 15:04:20,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1133466.6666666667, ans=0.0 2023-11-20 15:04:26,896 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 1700, loss[loss=0.0783, simple_loss=0.09757, pruned_loss=0.01938, audio_tagging_loss=0.01014, over 15294.00 frames. ], tot_loss[loss=0.07771, simple_loss=0.09778, pruned_loss=0.01858, audio_tagging_loss=0.01024, over 3036722.66 frames. ], batch size: 56, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:04:28,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1133533.3333333333, ans=0.2 2023-11-20 15:04:36,551 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.057e+01 7.994e+01 8.673e+01 9.340e+01 1.265e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-20 15:04:38,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1133600.0, ans=0.125 2023-11-20 15:04:48,881 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 170050 2023-11-20 15:05:16,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1133733.3333333333, ans=0.07 2023-11-20 15:05:18,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1133800.0, ans=0.1 2023-11-20 15:05:26,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1133800.0, ans=0.125 2023-11-20 15:05:30,940 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 1750, loss[loss=0.07485, simple_loss=0.08906, pruned_loss=0.0217, audio_tagging_loss=0.008621, over 13301.00 frames. ], tot_loss[loss=0.07803, simple_loss=0.09871, pruned_loss=0.0187, audio_tagging_loss=0.009979, over 3042201.77 frames. ], batch size: 54, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:05:37,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1133866.6666666667, ans=0.125 2023-11-20 15:05:39,073 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=22.5 2023-11-20 15:05:39,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1133866.6666666667, ans=0.025 2023-11-20 15:05:50,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1133933.3333333333, ans=0.1 2023-11-20 15:05:51,277 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.28 vs. limit=22.5 2023-11-20 15:05:54,395 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 170100 2023-11-20 15:05:54,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1133933.3333333333, ans=0.0 2023-11-20 15:06:06,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1134000.0, ans=0.125 2023-11-20 15:06:16,949 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2023-11-20 15:06:20,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1134066.6666666667, ans=0.125 2023-11-20 15:06:31,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1134133.3333333333, ans=0.0 2023-11-20 15:06:34,325 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.71 vs. limit=22.5 2023-11-20 15:06:34,912 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 1800, loss[loss=0.09009, simple_loss=0.1093, pruned_loss=0.02685, audio_tagging_loss=0.008577, over 15135.00 frames. ], tot_loss[loss=0.07762, simple_loss=0.09813, pruned_loss=0.01854, audio_tagging_loss=0.01001, over 3041189.90 frames. ], batch size: 57, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:06:46,187 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.918e+01 8.061e+01 8.642e+01 9.411e+01 1.208e+02, threshold=1.728e+02, percent-clipped=0.0 2023-11-20 15:06:52,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1134266.6666666667, ans=0.125 2023-11-20 15:06:59,358 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 170150 2023-11-20 15:07:00,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1134333.3333333333, ans=0.125 2023-11-20 15:07:03,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1134333.3333333333, ans=0.125 2023-11-20 15:07:40,593 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 1850, loss[loss=0.09153, simple_loss=0.1158, pruned_loss=0.02591, audio_tagging_loss=0.007716, over 15105.00 frames. ], tot_loss[loss=0.07853, simple_loss=0.09909, pruned_loss=0.01906, audio_tagging_loss=0.00993, over 3041178.74 frames. ], batch size: 54, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 15:07:56,405 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 15:08:00,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1134600.0, ans=0.05 2023-11-20 15:08:03,566 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 170200 2023-11-20 15:08:12,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1134666.6666666667, ans=0.05 2023-11-20 15:08:16,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1134666.6666666667, ans=0.125 2023-11-20 15:08:23,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1134733.3333333333, ans=0.0 2023-11-20 15:08:39,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1134800.0, ans=0.0 2023-11-20 15:08:43,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1134800.0, ans=0.125 2023-11-20 15:08:45,390 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 1900, loss[loss=0.08462, simple_loss=0.1064, pruned_loss=0.02291, audio_tagging_loss=0.008487, over 15189.00 frames. ], tot_loss[loss=0.07822, simple_loss=0.09877, pruned_loss=0.01896, audio_tagging_loss=0.009881, over 3042092.24 frames. ], batch size: 57, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 15:08:45,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1134866.6666666667, ans=0.0 2023-11-20 15:08:56,323 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.854e+01 8.016e+01 8.806e+01 9.698e+01 1.880e+02, threshold=1.761e+02, percent-clipped=1.0 2023-11-20 15:09:01,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1134933.3333333333, ans=0.1 2023-11-20 15:09:08,218 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 170250 2023-11-20 15:09:49,309 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 1950, loss[loss=0.0873, simple_loss=0.1171, pruned_loss=0.0207, audio_tagging_loss=0.008045, over 15886.00 frames. ], tot_loss[loss=0.07849, simple_loss=0.09904, pruned_loss=0.01913, audio_tagging_loss=0.009838, over 3047958.64 frames. ], batch size: 58, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 15:10:05,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1135266.6666666667, ans=0.0 2023-11-20 15:10:13,257 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 170300 2023-11-20 15:10:34,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1135400.0, ans=0.1 2023-11-20 15:10:38,420 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.58 vs. limit=15.0 2023-11-20 15:10:53,743 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 2000, loss[loss=0.09051, simple_loss=0.1144, pruned_loss=0.02223, audio_tagging_loss=0.01108, over 15215.00 frames. ], tot_loss[loss=0.07795, simple_loss=0.09834, pruned_loss=0.01892, audio_tagging_loss=0.009853, over 3051150.43 frames. ], batch size: 56, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:10:54,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1135533.3333333333, ans=0.125 2023-11-20 15:11:05,447 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.656e+01 7.869e+01 8.530e+01 9.315e+01 1.202e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-20 15:11:16,603 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 170350 2023-11-20 15:11:32,647 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.73 vs. limit=6.0 2023-11-20 15:11:36,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1135733.3333333333, ans=0.0 2023-11-20 15:11:41,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1135733.3333333333, ans=15.0 2023-11-20 15:11:44,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1135800.0, ans=0.0 2023-11-20 15:11:55,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.47 vs. limit=15.0 2023-11-20 15:11:58,368 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 2050, loss[loss=0.09436, simple_loss=0.1078, pruned_loss=0.0314, audio_tagging_loss=0.009068, over 15063.00 frames. ], tot_loss[loss=0.07852, simple_loss=0.09904, pruned_loss=0.01913, audio_tagging_loss=0.009862, over 3053738.13 frames. ], batch size: 58, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:12:01,517 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.59 vs. limit=12.0 2023-11-20 15:12:21,244 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 170400 2023-11-20 15:12:24,656 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.46 vs. limit=22.5 2023-11-20 15:12:30,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1136000.0, ans=0.1 2023-11-20 15:13:02,589 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 2100, loss[loss=0.06599, simple_loss=0.08638, pruned_loss=0.0126, audio_tagging_loss=0.01021, over 14879.00 frames. ], tot_loss[loss=0.07874, simple_loss=0.09953, pruned_loss=0.01912, audio_tagging_loss=0.009856, over 3051513.04 frames. ], batch size: 56, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:13:03,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.65 vs. limit=10.0 2023-11-20 15:13:04,460 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2023-11-20 15:13:13,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1136200.0, ans=0.05 2023-11-20 15:13:14,256 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.146e+01 8.226e+01 8.979e+01 1.003e+02 1.386e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-20 15:13:23,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1136266.6666666667, ans=0.0 2023-11-20 15:13:26,691 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 170450 2023-11-20 15:13:30,440 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.26 vs. limit=22.5 2023-11-20 15:13:48,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1136400.0, ans=0.125 2023-11-20 15:14:02,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1136466.6666666667, ans=0.1 2023-11-20 15:14:04,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1136466.6666666667, ans=0.1 2023-11-20 15:14:05,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1136466.6666666667, ans=0.1 2023-11-20 15:14:07,085 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 2150, loss[loss=0.08494, simple_loss=0.09967, pruned_loss=0.0219, audio_tagging_loss=0.0132, over 13866.00 frames. ], tot_loss[loss=0.07876, simple_loss=0.09956, pruned_loss=0.01903, audio_tagging_loss=0.009947, over 3047472.51 frames. ], batch size: 53, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:14:10,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1136533.3333333333, ans=0.125 2023-11-20 15:14:12,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1136533.3333333333, ans=0.1 2023-11-20 15:14:30,406 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 170500 2023-11-20 15:14:30,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1136600.0, ans=0.0 2023-11-20 15:14:42,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.72 vs. limit=15.0 2023-11-20 15:14:45,072 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 15:14:46,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1136733.3333333333, ans=0.5 2023-11-20 15:14:47,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1136733.3333333333, ans=0.125 2023-11-20 15:14:59,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1136800.0, ans=0.125 2023-11-20 15:15:12,317 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 2200, loss[loss=0.07314, simple_loss=0.09403, pruned_loss=0.01778, audio_tagging_loss=0.00835, over 15825.00 frames. ], tot_loss[loss=0.07871, simple_loss=0.09968, pruned_loss=0.01897, audio_tagging_loss=0.009897, over 3053828.03 frames. ], batch size: 58, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:15:16,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1136866.6666666667, ans=0.0 2023-11-20 15:15:17,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1136866.6666666667, ans=0.125 2023-11-20 15:15:23,631 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.421e+01 8.880e+01 9.731e+01 1.234e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-20 15:15:33,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1136933.3333333333, ans=0.125 2023-11-20 15:15:34,860 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 170550 2023-11-20 15:15:37,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1137000.0, ans=0.125 2023-11-20 15:15:40,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1137000.0, ans=0.125 2023-11-20 15:15:40,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1137000.0, ans=0.0 2023-11-20 15:15:44,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1137000.0, ans=0.05 2023-11-20 15:16:14,875 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.95 vs. limit=12.0 2023-11-20 15:16:16,550 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 2250, loss[loss=0.07577, simple_loss=0.09496, pruned_loss=0.01852, audio_tagging_loss=0.00977, over 14941.00 frames. ], tot_loss[loss=0.07904, simple_loss=0.1001, pruned_loss=0.0191, audio_tagging_loss=0.009909, over 3050257.66 frames. ], batch size: 56, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:16:34,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1137266.6666666667, ans=0.0 2023-11-20 15:16:36,119 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=15.0 2023-11-20 15:16:39,909 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 170600 2023-11-20 15:16:40,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1137266.6666666667, ans=0.125 2023-11-20 15:16:40,223 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 15:17:02,600 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 15:17:10,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1137466.6666666667, ans=0.1 2023-11-20 15:17:21,566 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 2300, loss[loss=0.07892, simple_loss=0.1088, pruned_loss=0.01482, audio_tagging_loss=0.009677, over 14920.00 frames. ], tot_loss[loss=0.07831, simple_loss=0.099, pruned_loss=0.01877, audio_tagging_loss=0.01003, over 3040743.64 frames. ], batch size: 57, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:17:33,174 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.682e+01 8.112e+01 8.584e+01 9.317e+01 1.375e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-20 15:17:38,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2023-11-20 15:17:40,669 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2023-11-20 15:17:45,538 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 170650 2023-11-20 15:17:49,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1137666.6666666667, ans=0.05 2023-11-20 15:18:05,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1137733.3333333333, ans=0.125 2023-11-20 15:18:06,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1137733.3333333333, ans=0.125 2023-11-20 15:18:09,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1137733.3333333333, ans=0.0 2023-11-20 15:18:11,153 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2023-11-20 15:18:18,338 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 15:18:26,365 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 2350, loss[loss=0.09914, simple_loss=0.1351, pruned_loss=0.02208, audio_tagging_loss=0.009498, over 16765.00 frames. ], tot_loss[loss=0.07889, simple_loss=0.09982, pruned_loss=0.01901, audio_tagging_loss=0.009974, over 3034806.55 frames. ], batch size: 60, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:18:26,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1137866.6666666667, ans=0.025 2023-11-20 15:18:42,445 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.50 vs. limit=15.0 2023-11-20 15:18:48,985 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 170700 2023-11-20 15:18:56,521 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.49 vs. limit=10.0 2023-11-20 15:19:29,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1138200.0, ans=0.125 2023-11-20 15:19:30,690 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 2400, loss[loss=0.05762, simple_loss=0.06609, pruned_loss=0.01177, audio_tagging_loss=0.01281, over 14868.00 frames. ], tot_loss[loss=0.07932, simple_loss=0.1005, pruned_loss=0.0191, audio_tagging_loss=0.009949, over 3040700.27 frames. ], batch size: 57, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:19:32,665 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.56 vs. limit=15.0 2023-11-20 15:19:34,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1138200.0, ans=0.125 2023-11-20 15:19:42,894 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.480e+01 8.080e+01 8.821e+01 9.568e+01 1.388e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-20 15:19:54,145 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 170750 2023-11-20 15:19:54,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1138266.6666666667, ans=0.0 2023-11-20 15:19:59,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1138333.3333333333, ans=0.125 2023-11-20 15:20:00,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1138333.3333333333, ans=0.125 2023-11-20 15:20:13,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1138400.0, ans=0.125 2023-11-20 15:20:22,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1138466.6666666667, ans=0.125 2023-11-20 15:20:28,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2023-11-20 15:20:35,603 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 2450, loss[loss=0.07812, simple_loss=0.1043, pruned_loss=0.0184, audio_tagging_loss=0.007564, over 16128.00 frames. ], tot_loss[loss=0.07901, simple_loss=0.09984, pruned_loss=0.01899, audio_tagging_loss=0.0101, over 3044183.68 frames. ], batch size: 59, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:20:57,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1138600.0, ans=0.125 2023-11-20 15:20:59,014 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 170800 2023-11-20 15:20:59,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1138600.0, ans=0.0 2023-11-20 15:21:02,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1138666.6666666667, ans=0.05 2023-11-20 15:21:41,389 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 2500, loss[loss=0.07459, simple_loss=0.08721, pruned_loss=0.0194, audio_tagging_loss=0.01158, over 16423.00 frames. ], tot_loss[loss=0.07874, simple_loss=0.09937, pruned_loss=0.01885, audio_tagging_loss=0.0102, over 3042436.39 frames. ], batch size: 63, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:21:54,766 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.766e+01 8.090e+01 8.633e+01 9.562e+01 1.495e+02, threshold=1.727e+02, percent-clipped=0.0 2023-11-20 15:21:57,471 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2023-11-20 15:22:00,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1138933.3333333333, ans=0.0 2023-11-20 15:22:04,112 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 170850 2023-11-20 15:22:14,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1139000.0, ans=0.0 2023-11-20 15:22:19,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1139066.6666666667, ans=0.125 2023-11-20 15:22:36,119 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=15.0 2023-11-20 15:22:45,290 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 2550, loss[loss=0.07241, simple_loss=0.09577, pruned_loss=0.01704, audio_tagging_loss=0.007481, over 15651.00 frames. ], tot_loss[loss=0.07866, simple_loss=0.09939, pruned_loss=0.01886, audio_tagging_loss=0.0101, over 3041128.73 frames. ], batch size: 59, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:22:49,064 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2023-11-20 15:22:52,688 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2023-11-20 15:22:58,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1139266.6666666667, ans=0.125 2023-11-20 15:23:01,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1139266.6666666667, ans=0.125 2023-11-20 15:23:01,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1139266.6666666667, ans=0.0 2023-11-20 15:23:08,691 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 170900 2023-11-20 15:23:09,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1139333.3333333333, ans=0.125 2023-11-20 15:23:26,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1139400.0, ans=0.2 2023-11-20 15:23:27,732 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.66 vs. limit=15.0 2023-11-20 15:23:42,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1139466.6666666667, ans=0.0 2023-11-20 15:23:50,118 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 2600, loss[loss=0.0612, simple_loss=0.08095, pruned_loss=0.01306, audio_tagging_loss=0.007664, over 15448.00 frames. ], tot_loss[loss=0.07877, simple_loss=0.09979, pruned_loss=0.01892, audio_tagging_loss=0.009953, over 3045507.01 frames. ], batch size: 59, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:23:51,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1139533.3333333333, ans=0.0 2023-11-20 15:23:52,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1139533.3333333333, ans=0.0 2023-11-20 15:24:04,210 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.595e+01 8.100e+01 8.871e+01 9.785e+01 4.234e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 15:24:13,796 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 170950 2023-11-20 15:24:29,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1139733.3333333333, ans=0.5 2023-11-20 15:24:55,131 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 2650, loss[loss=0.08744, simple_loss=0.1037, pruned_loss=0.02663, audio_tagging_loss=0.008944, over 15411.00 frames. ], tot_loss[loss=0.07926, simple_loss=0.1008, pruned_loss=0.01907, audio_tagging_loss=0.009791, over 3044356.19 frames. ], batch size: 58, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:25:05,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1139866.6666666667, ans=0.125 2023-11-20 15:25:06,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1139933.3333333333, ans=0.0 2023-11-20 15:25:18,195 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 171000 2023-11-20 15:25:31,022 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.91 vs. limit=12.0 2023-11-20 15:25:31,065 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2023-11-20 15:25:38,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1140066.6666666667, ans=0.125 2023-11-20 15:25:54,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1140133.3333333333, ans=0.2 2023-11-20 15:26:00,160 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 2700, loss[loss=0.09851, simple_loss=0.1256, pruned_loss=0.02647, audio_tagging_loss=0.009243, over 15037.00 frames. ], tot_loss[loss=0.07898, simple_loss=0.1006, pruned_loss=0.01895, audio_tagging_loss=0.009739, over 3044570.38 frames. ], batch size: 55, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:26:05,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1140200.0, ans=0.125 2023-11-20 15:26:09,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1140200.0, ans=0.125 2023-11-20 15:26:14,308 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.575e+01 8.137e+01 8.718e+01 9.635e+01 1.314e+02, threshold=1.744e+02, percent-clipped=1.0 2023-11-20 15:26:23,575 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 171050 2023-11-20 15:26:25,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1140333.3333333333, ans=0.125 2023-11-20 15:26:33,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=12.0 2023-11-20 15:26:48,514 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.67 vs. limit=15.0 2023-11-20 15:26:56,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1140466.6666666667, ans=0.1 2023-11-20 15:27:04,573 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 2750, loss[loss=0.0768, simple_loss=0.1062, pruned_loss=0.0146, audio_tagging_loss=0.009095, over 16254.00 frames. ], tot_loss[loss=0.07892, simple_loss=0.1004, pruned_loss=0.01898, audio_tagging_loss=0.009717, over 3042688.81 frames. ], batch size: 61, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:27:20,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1140600.0, ans=0.2 2023-11-20 15:27:20,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1140600.0, ans=0.125 2023-11-20 15:27:28,518 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 171100 2023-11-20 15:27:31,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1140666.6666666667, ans=0.07 2023-11-20 15:27:41,569 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.65 vs. limit=15.0 2023-11-20 15:27:55,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1140800.0, ans=0.0 2023-11-20 15:27:58,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1140800.0, ans=0.125 2023-11-20 15:28:00,060 WARNING [train_asr.py:1506] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 15:28:09,473 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 2800, loss[loss=0.09773, simple_loss=0.137, pruned_loss=0.02112, audio_tagging_loss=0.008134, over 16483.00 frames. ], tot_loss[loss=0.07884, simple_loss=0.1004, pruned_loss=0.01896, audio_tagging_loss=0.009658, over 3037324.25 frames. ], batch size: 60, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:28:09,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1140866.6666666667, ans=0.125 2023-11-20 15:28:16,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1140866.6666666667, ans=0.1 2023-11-20 15:28:23,472 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.091e+01 8.017e+01 8.655e+01 9.428e+01 1.274e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-20 15:28:32,322 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 171150 2023-11-20 15:28:37,878 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.93 vs. limit=6.0 2023-11-20 15:28:44,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1141000.0, ans=0.0 2023-11-20 15:28:52,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1141066.6666666667, ans=0.0 2023-11-20 15:29:13,868 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 2850, loss[loss=0.06746, simple_loss=0.08595, pruned_loss=0.01379, audio_tagging_loss=0.0107, over 15167.00 frames. ], tot_loss[loss=0.07774, simple_loss=0.09881, pruned_loss=0.0186, audio_tagging_loss=0.009732, over 3038368.93 frames. ], batch size: 56, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:29:32,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1141266.6666666667, ans=0.125 2023-11-20 15:29:37,347 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 171200 2023-11-20 15:29:56,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1141400.0, ans=0.0 2023-11-20 15:30:09,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1141466.6666666667, ans=0.125 2023-11-20 15:30:18,084 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 2900, loss[loss=0.0811, simple_loss=0.09819, pruned_loss=0.02336, audio_tagging_loss=0.008653, over 15994.00 frames. ], tot_loss[loss=0.07794, simple_loss=0.09894, pruned_loss=0.01873, audio_tagging_loss=0.009743, over 3043807.86 frames. ], batch size: 63, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:30:23,169 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.95 vs. limit=15.0 2023-11-20 15:30:32,494 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.283e+01 7.914e+01 8.602e+01 9.219e+01 1.779e+02, threshold=1.720e+02, percent-clipped=1.0 2023-11-20 15:30:42,593 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 171250 2023-11-20 15:31:07,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1141733.3333333333, ans=0.1 2023-11-20 15:31:23,115 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 2950, loss[loss=0.07664, simple_loss=0.09263, pruned_loss=0.01734, audio_tagging_loss=0.01299, over 14260.00 frames. ], tot_loss[loss=0.07832, simple_loss=0.09926, pruned_loss=0.01881, audio_tagging_loss=0.009875, over 3040130.63 frames. ], batch size: 57, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:31:40,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1141933.3333333333, ans=0.125 2023-11-20 15:31:44,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1141933.3333333333, ans=0.0 2023-11-20 15:31:46,752 INFO [model.py:792] (3/4) Freeze_encoder: False; Current batch idx: 171300 2023-11-20 15:32:03,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1142066.6666666667, ans=0.2 2023-11-20 15:32:12,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1142066.6666666667, ans=0.125 2023-11-20 15:32:18,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2023-11-20 15:32:18,671 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.80 vs. limit=15.0 2023-11-20 15:32:28,328 INFO [train_asr.py:1262] (3/4) Epoch 15, batch 3000, loss[loss=0.0954, simple_loss=0.1241, pruned_loss=0.02483, audio_tagging_loss=0.00852, over 15188.00 frames. ], tot_loss[loss=0.07857, simple_loss=0.09989, pruned_loss=0.01882, audio_tagging_loss=0.009805, over 3046740.94 frames. ], batch size: 54, lr: 4.63e-03, grad_scale: 32.0 2023-11-20 15:32:28,329 INFO [train_asr.py:1285] (3/4) Computing validation loss